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PREFACE TO THE THIRD EDITION 


The Third Edition contains some new material. More specifically, the chapter on large sam- 
ple theory has been reorganized, repositioned, and re-titled in recognition of the growing 
role of asymptotic statistics. In Chapter 12 on General Linear Hypothesis, the section on 
regression analysis has been greatly expanded to include multiple regression and logistic 
and Poisson regression. 

Some more problems and remarks have been added to illustrate the material covered. 
The basic character of the book, however, remains the same as enunciated in the Preface to 
the first edition. It remains a solid introduction to first-year graduate students or advanced 
seniors in mathematics and statistics as well as a reference to students and researchers in 
other sciences. 

We are grateful to the readers for their comments on this book over the past 40 years 
and would welcome any questions, comments, and suggestions. You can communi- 
cate with Vijay K. Rohatgi at vrohatg@bgsu.edu and with A. K. Md. Ehsanes Saleh at 
esaleh @ math.carleton.ca. 


Solana Beach, CA VirAy K. ROHATGI 
Ottawa, Canada A. K. Md. EHSANES SALEH 


PREFACE TO THE SECOND EDITION 


There is a lot that is different about this second edition. First, there is a co-author without 
whose help this revision would not have been possible. Second, we have benefited from 
countless letters from readers and colleagues who have pointed out errors and omissions 
and have made valuable suggestions over the past 25 years. These communications make 
this revision worth the effort. Third, we have tried to update the content of the book while 
striving to preserve the character and spirit of the first edition. 

Here are some of the numerous changes that have been made. 


Ls 


The Introduction section has been removed. We have also removed Chapter 14 on 
sequential statistical inference. 


. Many parts of the book have gone substantial rewriting. For example, Chapter 4 has 


many changes, such as inclusion of exchangeability. In Chapter 3, an introduction to 
characteristic functions has been added. In Chapter 5 some new distributions have 
been added while in Chapter 6 there have been many changes in proofs. 


. The statistical inference part of the book (Chapters 8 to 13) has been updated. 


Thus in Chapter 8 we have expanded the coverage of invariance and have included 
discussions of ancillary statistics and conjugate prior distributions. 


. Similar changes have been made in Chapter 9. A new section on locally most 


powerful tests has been added. 


. Chapter 11 has been greatly revised and a discussion of invariant confidence 


intervals has been added. 


. Chapter 13 has been completely rewritten in the light of increased emphasis on 


nonparametric inference. We have expanded the discussion of U-statistics. Later 
sections show the connection between commonly used tests and U-statistics. 


. In Chapter 12, the notation has been changed to confirm to the current convention. 


xvi PREFACE TO THE SECOND EDITION 


8. Many problems and examples have been added. 
9. More figures have been added to illustrate examples and proofs. 


10. Answers to selected problems have been provided. 


We are truly grateful to the readers of the first edition for countless comments and 
suggestions and hope we will continue to hear from them about this edition. 

Special thanks are due Ms. Gillian Murray for her superb word processing of the 
manuscript, and Dr. Indar Bhatia for figures that appear in the text. Dr. Bhatia spent count- 
less hours preparing the diagrams for publication. We also acknowledge the assistance of 
Dr. K. Selvavel. 


VitAY K. ROHATGI 
A. K. Md. EHSANES SALEH 


PREFACE TO THE FIRST EDITION 


This book on probability theory and mathematical statistics is designed for a three-quarter 
course meeting 4 hours per week or a two-semester course meeting 3 hours per week. It is 
designed primarily for advanced seniors and beginning graduate students in mathematics, 
but it can also be used by students in physics and engineering with strong mathematical 
backgrounds. Let me emphasize that this is a mathematics text and not a “cookbook.” It 
should not be used as a text for service courses. 

The mathematics prerequisites for this book are modest. It is assumed that the reader has 
had basic courses in set theory and linear algebra and a solid course in advanced calculus. 
No prior knowledge of probability and/or statistics is assumed. 

My aim is to provide a solid and well-balanced introduction to probability theory and 
mathematical statistics. It is assumed that students who wish to do graduate work in prob- 
ability theory and mathematical statistics will be taking, concurrently with this course, a 
measure-theoretic course in analysis if they have not already had one. These students can 
go on to take advanced-level courses in probability theory or mathematical statistics after 
completing this course. 

This book consists of essentially three parts, although no such formal divisions are des- 
ignated in the text. The first part consists of Chapters 1 through 6, which form the core of 
the probability portion of the course. The second part, Chapters 7 through 11, covers the 
foundations of statistical inference. The third part consists of the remaining three chapters 
on special topics. For course sequences that separate probability and mathematical statis- 
tics, the first part of the book can be used for a course in probability theory, followed by 
a course in mathematical statistics based on the second part and, possibly, one or more 
chapters on special topics. 

The reader will find here a wealth of material. Although the topics covered are fairly 
conventional, the discussions and special topics included are not. Many presentations give 
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far more depth than is usually the case in a book at this level. Some special features of the 
book are the following: 


1. 
2. 


A well-referenced chapter on the preliminaries. 


About 550 problems, over 350 worked-out examples, about 200 remarks, and about 
150 references. 


. An advance warning to reader wherever the details become too involved. They can 


skip the later portion of the section in question on first reading without destroying 
the continuity in any way. 


4. Many results on characterizations of distributions (Chapter 5). 


10. 
11. 


. Proof of the central limit theorem by the method of operators and proof of the 


strong law of large numbers (Chapter 6). 


. Asection on minimal sufficient statistics (Chapter 8). 
. Achapter on special tests (Chapter 10). 
. A careful presentation of the theory of confidence intervals, including Bayesian 


intervals and shortest-length confidence intervals (Chapter 11). 


. Achapter on the general linear hypothesis, which carries linear models through to 


their use in basic analysis of variance (Chapter 12). 
Sections on nonparametric estimation and robustness (Chapter 13). 
Two sections on sequential estimation (Chapter 14). 


The contents of this book were used in a 1-year (two-semester) course that I taught three 
times at the Catholic University of America and once in a three-quarter course at Bowling 
Green State University. In the fall of 1973 my colleague, Professor Eugene Lukacs, taught 
the first quarter of this same course on the basis of my notes, which eventually became 
this book. I have always been able to cover this book (with few omissions) in a 1-year 
course, lecturing 3 hours a week. An hour-long problem session every week is conducted 
by a senior graduate student. 

In a book of this size there are bound to be some misprints, errors, and ambiguities of 
presentation. I shall be grateful to any reader who brings these to my attention. 


Bowling Green, Ohio V. K. ROHATGI 
February 1975 
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ENUMERATION OF THEOREMS 
AND REFERENCES 


This book is divided into 13 chapters, numbered | through 13. Each chapter is divided 
into several sections. Lemmas, theorems, equations, definitions, remarks, figures, and so 
on, are numbered consecutively within each section. Thus Theorem i.j.k refers to the kth 
theorem in Section j of Chapter 7, Section i.j refers to the jth section of Chapter i, and 
so on. Theorem j refers to the jth theorem of the section in which it appears. A similar 
convention is used for equations except that equation numbers are enclosed in parenthe- 
ses. Each section is followed by a set of problems for which the same numbering system 
is used. 

References are given at the end of the book and are denoted in the text by numbers 
enclosed in square brackets, [ _]. If a citation is to a book, the notation ({i, p. j]) refers to 
the jth page of the reference numbered [i]. 

A word about the proofs of results stated without proof in this book. If a reference 
appears immediately following or preceding the statement of a result, it generally means 
that the proof is beyond the scope of this text. If no reference is given, it indicates that the 
proof is left to the reader. Sometimes the reader is asked to supply the proof as a problem. 


PROBABILITY 


1.1 INTRODUCTION 


The theory of probability had its origin in gambling and games of chance. It owes much 
to the curiosity of gamblers who pestered their friends in the mathematical world with all 
sorts of questions. Unfortunately this association with gambling contributed to a very slow 
and sporadic growth of probability theory as a mathematical discipline. The mathemati- 
cians of the day took little or no interest in the development of any theory but looked only 
at the combinatorial reasoning involved in each problem. 

The first attempt at some mathematical rigor is credited to Laplace. In his monumental 
work, Theorie analytique des probabilités (1812), Laplace gave the classical definition of 
the probability of an event that can occur only in a finite number of ways as the proportion 
of the number of favorable outcomes to the total number of all possible outcomes, provided 
that all the outcomes are equally likely. According to this definition, the computation of 
the probability of events was reduced to combinatorial counting problems. Even in those 
days, this definition was found inadequate. In addition to being circular and restrictive, 
it did not answer the question of what probability is, it only gave a practical method of 
computing the probabilities of some simple events. 

An extension of the classical definition of Laplace was used to evaluate the probabilities 
of sets of events with infinite outcomes. The notion of equal likelihood of certain events 
played a key role in this development. According to this extension, if Q is some region with 
a well-defined measure (length, area, volume, etc.), the probability that a point chosen at 
random lies in a subregion A of (2 is the ratio measure(A) /measure(Q2). Many problems 
of geometric probability were solved using this extension. The trouble is that one can 
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define “at random” in any way one pleases, and different definitions therefore lead to dif- 
ferent answers. Joseph Bertrand, for example, in his book Calcul des probabilités (Paris, 
1889) cited a number of problems in geometric probability where the result depended 
on the method of solution. In Example 9 we will discuss the famous Bertrand paradox 
and show that in reality there is nothing paradoxical about Bertrand’s paradoxes; once 
we define “probability spaces” carefully, the paradox is resolved. Nevertheless difficul- 
ties encountered in the field of geometric probability have been largely responsible for 
the slow growth of probability theory and its tardy acceptance by mathematicians as a 
mathematical discipline. 

The mathematical theory of probability, as we know it today, is of comparatively recent 
origin. It was A. N. Kolmogorov who axiomatized probability in his fundamental work, 
Foundations of the Theory of Probability (Berlin), in 1933. According to this development, 
random events are represented by sets and probability is just a normed measure defined on 
these sets. This measure-theoretic development not only provided a logically consistent 
foundation for probability theory but also, at the same time, joined it to the mainstream of 
modern mathematics. 

In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we intro- 
duce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms of 
probability and study some simple consequences of these axioms. Section 1.4 is devoted to 
the computation of probability on finite sample spaces. Section 1.5 deals with conditional 
probability and Bayes’s rule while Section 1.6 examines the independence of events. 


1.2 SAMPLE SPACE 


In most branches of knowledge, experiments are a way of life. In probability and statis- 
tics, too, we concern ourselves with special types of experiments. Consider the following 
examples. 


Example 1. A coin is tossed. Assuming that the coin does not land on the side, there are 
two possible outcomes of the experiment: heads and tails. On any performance of this 
experiment one does not know what the outcome will be. The coin can be tossed as many 
times as desired. 


Example 2. A roulette wheel is a circular disk divided into 38 equal sectors numbered 
from 0 to 36 and 00. A ball is rolled on the edge of the wheel, and the wheel is rolled 
in the opposite direction. One bets on any of the 38 numbers or some combinations of 
them. One can also bet on a color, red or black. If the ball lands in the sector numbered 
32, say, anybody who bet on 32 or combinations including 32 wins, and so on. In this 
experiment, all possible outcomes are known in advance, namely 00, 0, 1, 2,...,36, but 
on any performance of the experiment there is uncertainty as to what the outcome will be, 
provided, of course, that the wheel is not rigged in any manner. Clearly, the wheel can be 
rolled any number of times. 


Example 3. A manufacturer produces footrules. The experiment consists in measuring 
the length of a footrule produced by the manufacturer as accurately as possible. Because 
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of errors in the production process one does not know what the true length of the footrule 
selected will be. It is clear, however, that the length will be, say, between 11 and 13 in., 
or, if one wants to be safe, between 6 and 18 in. 


Example 4. The length of life of a light bulb produced by a certain manufacturer is 
recorded. In this case one does not know what the length of life will be for the light bulb 
selected, but clearly one is aware in advance that it will be some number between 0 and 
oo hours. 


The experiments described above have certain common features. For each experiment, 
we know in advance all possible outcomes, that is, there are no surprises in store after the 
performance of any experiment. On any performance of the experiment, however, we do 
not know what the specific outcome will be, that is, there is uncertainty about the outcome 
on any performance of the experiment. Moreover, the experiment can be repeated under 
identical conditions. These features describe a random (or a statistical) experiment. 


Definition 1. A random (or a statistical) experiment is an experiment in which 


(a) all outcomes of the experiment are known in advance, 


(b) any performance of the experiment results in an outcome that is not known in 
advance, and 


(c) the experiment can be repeated under identical conditions. 


In probability theory we study this uncertainty of a random experiment. It is convenient 
to associate with each such experiment a set 2, the set of all possible outcomes of the 
experiment. To engage in any meaningful discussion about the experiment, we associate 
with 2 a o-field 5, of subsets of 2. We recall that a o-field is a nonempty class of subsets 
of 2 that is closed under the formation of countable unions and complements and contains 
the null set ®. 


Definition 2. The sample space of a statistical experiment is a pair (0,5), where 


(a) (is the set of all possible outcomes of the experiment and 
(b) 8 is a o-field of subsets of Q. 


The elements of 2 are called sample points. Any set A € S is known as an event. Clearly 
A is acollection of sample points. We say that an event A happens if the outcome of the 
experiment corresponds to a point in A. Each one-point set is known as a simple or an 
elementary event. If the set Q contains only a finite number of points, we say that (QS) is 
a finite sample space. If 2 contains at most a countable number of points, we call (0,8) 
a discrete sample space. If, however, Q contains uncountably many points, we say that 
(Q,8) is an uncountable sample space. In particular, if Q = R;, or some rectangle in R,, 
we call it a continuous sample space. 


Remark I. The choice of S is an important one, and some remarks are in order. If 2 con- 
tains at most a countable number of points, we can always take S to be the class of all 


4 PROBABILITY 


subsets of Q. This is certainly a o-field. Each one point set is a member of S and is the 
fundamental object of interest. Every subset of 2 is an event. If 2 has uncountably many 
points, the class of all subsets of (Q is still a o-field, but it is much too large a class of 
sets to be of interest. It may not be possible to choose the class of all subsets of 2 as 8. 
One of the most important examples of an uncountable sample space is the case in which 
Q = 8 or Q is an interval in ®. In this case we would like all one-point subsets of Q and all 
intervals (closed, open, or semiclosed) to be events. We use our knowledge of analysis to 
specify S. We will not go into details here except to recall that the class of all semiclosed 
intervals (a,b] generates a class 8, which is a o-field on RX. This class contains all one- 
point sets and all intervals (finite or infinite). We take S = 8. Since we will be dealing 
mostly with the one-dimensional case, we will write 5 instead of 8. There are many 
subsets of ® that are not in 8,, but we will not demonstrate this fact here. We refer the 
reader to Halmos [42], Royden [96], or Kolmogorov and Fomin [54] for further details. 


Example 5. Let us toss a coin. The set 2 is the set of symbols H and T, where H 
denotes head and T represents tail. Also, S is the class of all subsets of 0, namely, 
{{H},{T}, {H, T}, ®}. If the coin is tossed two times, then 


Q= {(H,H),(H,T),(T,H),(T,T)}, $= {0,{(H, HD}, 
{(H,T)},{(T,H)}, (2, T)t, {CH A), (A, T)}, (HA), (T,H)}, 
{(H,H),(T,T)}, (AT), (T,H)},{(7,T), (P,HK)}, (2,7), 
(H,T)}, {(, A), (HT), (T,A)}, (HH), (H,T), (T,T)}, 
{(H,H),(T,H),(T,T)}, (A, T), (TH), (T,T)},Q}, 


where the first element of a pair denotes the outcome of the first toss and the second 
element, the outcome of the second toss. The event at least one head consists of sample 
points (H, H), (H,T), (T, H). The event at most one head is the collection of sample points 
(H,T), (T,H), (T,T). 


Example 6. A die is rolled n times. The sample space is the pair (0,8), where Q is the 
set of all n-tuples (%1,%2,...,%n), 41 € {1,2,3,4,5,6}, 7= 1,2,...,n, and 8 is the class of 
all subsets of Q. Q contains 6” elementary events. The event A that | shows at least once 
is the set 


A = {(x1,X2,-.-,%n): at least one of x;’s is 1} 
= 0 — {(x1,%2,-.-,%,): none of the x;’s is 1} 
= 1 — {(x1,%2,.-.,%n): 4 € {2,3,4,5,6}, i= 1,2,...,n}. 


Example 7. A coin is tossed until the first head appears. Then 
Q = {H,(T,H),(T,T,H),(T,T,T,H),...}, 


and § is the class of all subsets of 2. An equivalent way of writing Q would be to look 
at the number of tosses required for the first head. Clearly, this number can take values 
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1,2,3,..., so that Q is the set of all positive integers. The S is the class of all subsets of 
positive integers. 


Example 8. Consider a pointer that is free to spin about the center of a circle. If the pointer 
is spun by an impulse, it will finally come to rest at some point. On the assumption that 
the mechanism is not rigged in any manner, each point on the circumference is a possible 
outcome of the experiment. The set 2. consists of all points 0 < x < 27r, where r is the 
radius of the circle. Every one-point set {x} is a simple event, namely, that the pointer 
will come to rest at x. The events of interest are those in which the pointer stops at a point 
belonging to a specified arc. Here 8 is taken to be the Borel o-field of subsets of [0,277r). 


Example 9. A rod of length / is thrown onto a flat table, which is ruled with parallel lines 
at distance 2/. The experiment consists in noting whether the rod intersects one of the ruled 
lines. 

Let r denote the distance from the center of the rod to the nearest ruled line, and let 0 
be the angle that the axis of the rod makes with this line (Fig. 1). Every outcome of this 
experiment corresponds to a point (r,@) in the plane. As 2 we take the set of all points 
(r,0) in {(7,0): 0<r<10<0 <7}. ForS we take the Borel o-field, 82, of subsets of 
Q, that is, the smallest o-field generated by rectangles of the form 


{(x,y):a<x<b, c<y<d, 0<a<b<l, 0<c<d<z}. 


Clearly the rod will intersect a ruled line if and only if the center of the rod lies in the area 
enclosed by the locus of the center of the rod (while one end touches the nearest line) and 
the nearest line (shaded area in Fig. 2). 


Remark 2. From the discussion above it should be clear that in the discrete case there is 
really no problem. Every one-point set is also an event, and S is the class of all subsets of 22. 


1/2 
21 


1/2 


Fig. 1 
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Qy 


m/2 a 


Fig. 2 


The problem, if there is any, arises only in regard to uncountable sample spaces. The reader 
has to remember only that in this case not all subsets of Q are events. The case of most inter- 
est is the one in which Q = &,. In this case, roughly all sets that have a well-defined volume 
(or area or length) are events. Not every set has the property in question, but sets that lack 
it are not easy to find and one does not encounter them in practice. 


PROBLEMS 1.2 


1. A club has five members A, B, C, D, and E. It is required to select a chairman and a 
secretary. Assuming that one member cannot occupy both positions, write the sam- 
ple space associated with these selections. What is the event that member A is an 
office holder? 

2. In each of the following experiments, what is the sample space? 

(a) Ina survey of families with three children, the sexes of the children are recorded 
in increasing order of age. 

(b) The experiment consists of selecting four items from a manufacturer’s output 
and observing whether or not each item is defective. 

(c) A given book is opened to any page, and the number of misprints is counted. 

(d) Two cards are drawn (i) with replacement and (ii) without replacement from an 
ordinary deck of cards. 

3. Let A, B, C be three arbitrary events on a sample space (2,8). What is the event that 
only A occurs? What is the event that at least two of A, B, C occur? What is the event 
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that both A and C, but not B, occur? What is the event that at most one of A, B, C 
occurs? 


1.3 PROBABILITY AXIOMS 


Let (0,8) be the sample space associated with a statistical experiment. In this section we 
define a probability set function and study some of its properties. 


Definition 1. Let (0,5) be a sample space. A set function P defined on § is called a 
probability measure (or simply probability) if it satisfies the following conditions: 


(i) P(A) > 0 for allA € 8. 
(ii) P(Q) =1. 
(iii) Let {Aj}, Aj € 8, 7 = 1,2,..., be a disjoint sequence of sets, that is, A; 7A, = ® 
for j 4 k where ® is the null set. Then 


P| S74; | => P(A), (1) 
j=l j=l 


where we have used the notation Ay to denote union of disjoint sets Aj. 


We call P(A) the probability of event A. If there is no confusion, we will write PA 
instead of P(A). Property (iii) is called countable additivity. That P® = 0 and P is also 
finitely additive follows from it. 


Remark 1. If Qis discrete and contains at most n (< 00) points, each single-point set {w;}, 
j=1,2,...,n, is an elementary event, and it is sufficient to assign probability to each {w;}. 
Then, if A € 8, where § is the class of all subsets of 0, PA = en P{w}. One such 
assignment is the equally likely assignment or the assignment of uniform probabilities. 
According to this assignment, P{w;} = 1/n,j = 1,2,...,n. Thus PA = m/n if A contains 
m elementary events, | <<m <n. 


Remark 2. If Q is discrete and contains a countable number of points, one cannot make 
an equally likely assignment of probabilities. It suffices to make the assignment for 
each elementary event. If A € 8, where 5S is the class of all subsets of 2, define PA = 


uea Plu}. 


Remark 3. If Q contains uncountably many points, each one-point set is an elementary 
event, and again one cannot make an equally likely assignment of probabilities. Indeed, 
one cannot assign positive probability to each elementary event without violating the 
axiom PQ = 1. In this case one assigns probabilities to compound events consisting of 
intervals. For example, if 2. = [0,1] and 8 is the Borel o-field of all subsets of Q, the 
assignment P{/] = length of J, where J is a subinterval of 2, defines a probability. 
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Definition 2. The triple (0,8, P) is called a probability space. 


Definition 3. Let A € S. We say that the odds for A are a to b if PA = a/(a+b), and then 
the odds against A are b to a. 


In many games of chance, probability is often stated in terms of odds against an event. 
Thus in horse racing a two dollar bet on a horse to win with odds of 2 to 1 (against) pays 


approximately six dollars if the horse wins the race. In this case the probability of winning 
is 1/3. 


Example 1. Let us toss a coin. The sample space is (2,8), where 2 = {H, T}, and §& is 
the o-field of all subsets of 2. Let us define P on S as follows. 


P{H} =1/2, P{T}=1/2. 


Then P clearly defines a probability. Similarly, P{H} = 2/3, P{T} = 1/3, and P{H} = 1, 
P{T} = 0 are probabilities defined on S. Indeed, 


P{H}=p and P{T}=1-p (O<p<1) 
defines a probability on (2,8). 


Example 2. Let 0 = {1,2,3,...} be the set of positive integers, and let S be the class of 
all subsets of 2. Define P on S as follows: 


1 
Pli}=5, §=1,2,.... 
Then )>*, P{i} = 1, and P defines a probability. 


Example 3. Let 2 = (0,00) and & = 8, the Borel o-Field on 2. Define P as follows: for 


each interval J CQ, 
Ph= ; e “dx. 
I 


Clearly PI > 0, PQ = 1, and P is countably additive by properties of integrals. 


Theorem 1. P is monotone and subtractive; that is, if A,B € 5 and A C B, then PA < PB 
and P(B—A) = PB— PA, where B— A = BNA‘, A‘ being the complement of the event A. 


Proof. If A CB, then 
B=(ANB)+(B—A) =A+(B-A). 
and it follows that PB = PA+ P(B—A). 


Corollary. For all A €8,0< PA< 1. 
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Remark 4. We wish to emphasize that, if PA = 0 for some A € 8, we call A an event with 


zero probability or a null event. However, it does not follow that A = ®. Similarly, if PB = 1 
for some B € 8, we call B a certain event but it does not follow that B = 2. 


Theorem 2 (The Addition Rule). If A,B € S, then 

P(AUB) = PA+PB—P(ANMB). (2) 
Proof. Clearly 

AUB = (A—B)+(B—A)+ (ANB) 
and 
A=(ANB)+(A—B),B = (ANB)+(B-—A). 
The result follows by countable additivity of P. 
Corollary 1. P is subadditive, that is, if A,B € 5, then 
P(AUB) < PA+PB. (3) 


Corollary 1 can be extended to an arbitrary number of events A;, 


(Us) 2 S| PAj. (4) 
F) J 
Corollary 2. If B =A‘, then A and B are disjoint and 
PA = 1-— PAS. (5) 
The following generalization of (2) is left as an exercise. 


Theorem 3 (The Principle of Inclusion—Exclusion). Let A; ,A,...,A, € 5. Then 


n 


(Gs) pa Sonne 
k=1 k=1 ki <ky 


+ $2 P(Ag NAg, NA‘) 
ki <ko<ks 


4++.-4-(—1)"*'!P (a : (6) 
k=1 
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Example 4. A die is rolled twice. Let all the elementary events in 2 = {(i,j): ij = 
1,2,...,6} be assigned the same probability. Let A be the event that the first throw shows 
a number < 2, and B, the event that the second throw shows at least 5. Then 


A=Wijctaie2 j= 1,235.61, 
B={j Sef 56,1 = 1) 2,042,6}, 
ANB = {(1,5), (1,6), (2,5), (2,6)}; 


P(AUB) = PA+ PB— P(ANB) 


=1iil_ 43 
=3 13 36 = 9 


Example 5. A coin is tossed three times. Let us assign equal probability to each of the 27 
elementary events in 2. Let A be the event that at least one head shows up in three throws. 
Then 


P(A) = 1—P(A‘) 
= | — P(no heads) 
=1—P(TTT) =?7. 


We next derive two useful inequalities. 
Theorem 4 (Bonferroni’s Inequality). Given n (> 1) events A,,A2,...,An, 
S 0 PAi— $0 P(AINA;) < P (Us) < SO PA. (7) 
i=l i<j i=1 i=] 


Proof. In view of (4) it suffices to prove the left side of (7). The proof is by induction. 
The inequality on the left is true for n = 2 since 


PA, + PAy — P(A, NA2) = P(A, UAd). 


For n = 3, 


3 3 
P (Us) = 5° PA;—S— P(A; Aj) + P(A A243), 
i=] 


i=1 i<j 


and the result holds. Assuming that (7) holds for 3 < m<n— 1, we show that it holds also 


form+1: 
m+1 m 
(Us) =P (Us) Ane 
i=] i=] 
¥ (Us) + Physi —P (sei . (U+)) 
i=1 1 
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> Sm SPA na)? (Yeaindns)) 
i<j = 
> m,- Sorts AA) — SPAM Amt) 
-S,- > P(AINA))- 
i<j 


Theorem 5 (Boole’s Inequality). For any two events, A and B, 


P(ANB) > 1— PAS — PB’. (8) 
Corollary 1. Let {A;}, j= 1,2,..., be a countable sequence of events; then 
P(NAj) > 1-55 (AS). (9) 
Proof. Take 
B=()A, and A=A, 
jz2 
in (8). 


Corollary 2 (The Implication Rule). If A,B,C € S and A and B imply C, then 
PC < PAS + PB‘. (10) 


Let {A,, } be a sequence of sets. The set of all points w € © that belong to A, for infinitely 
many values of n is known as the limit superior of the sequence and is denoted by 


limsupA, or lim A,. 
noo noo 


The set of all points that belong to A,, for all but a finite number of values of 1 is known 
as the limit inferior of the sequence {A,,} and is denoted by 


lim infA, or lim A,. 
noo noo 


If 


lim A, = jim n An, 


n—->co 


we say that the limit exists and write lim,_,., A, for the common set and call it the limit 
set. 
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We have 
im = VAS 1 UAe= jim An. 
n=1k=n n=1k=n 


If the sequence {A,} is such that A, C Anyi, for n = 1,2,..., it is called nondecreasing; 
if Ay, D Anyi,n = 1,2,..., itis called nonincreasing. If the sequence A, is nondecreasing, 
we write A, /; if A, is nonincreasing, we write A, 7. Clearly, if A, Y or A, Y, the limit 
exists and we have 


limA,=(JAn if An / 
n=1 
and 
limA,=()An ifA,Y. 
n=1 


Theorem 6. Let {A,} be a nondecreasing sequence of events in 8, that is, A, € 8, 
n=1,2,..., and 


An DAguty B= 2Byeee 


Then 
lita PAy = ( lines An) ip (U-.) (11) 
n—-oo n—- Co Ee 
Proof. Let 
A=|jJA. 
j=l 
Then 


A=An+)_(Ajt1—Aj)- 


j=n 


By countable additivity we have 


PA = PA, +) > P(Aj41 —A))- 
jen 
and letting n — oo, we see that 


[oe} 


PA= lim PA, + lim D P(A — Aj). 
j= 
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The second term on the right tends to 0 as n + oo since the sum De P(Aj41—Aj) <1 
and each summand is nonnegative. The result follows. 


Corollary. Let {A,,} be a nonincreasing sequence of events in 8. Then 


lima PA, = P( Tit An) ip (A) (12) 
n—-co n—- oo 


n=1 


Proof. Consider the nondecreasing sequence of events {AC}. Then 


It follows from Theorem 6 that 
deere oP (tea) A) 8, 


In other words, 


lim (1— PA,) =1—PA, 


n—-co 


as asserted. 


Remark 5. Theorem 6 and its corollary will be used quite frequently in subsequent chap- 
ters. Property (11) is called the continuity of P from below, and (12) is known as the 
continuity of P from above. Thus Theorem 6 and its corollary assure us that the set function 
P is continuous from above and below. 


We conclude this section with some remarks concerning the use of the word “ran- 
dom” in this book. In probability theory “random” has essentially three meanings. First, 
in sampling from a finite population a sample is said to be a random sample if at each 
draw all members available for selection have the same probability of being included. 
We will discuss sampling from a finite population in Section 1.4. Second, we speak of a 
random sample from a probability distribution. This notion is formalized in Section 6.2. 
The third meaning arises in the context of geometric probability, where statements such 
as “a point is randomly chosen from the interval (a,b)” and “a point is picked randomly 
from a unit square” are frequently encountered. Once we have studied random variables 
and their distributions, problems involving geometric probabilities may be formulated 
in terms of problems involving independent uniformly distributed random variables, and 
these statements can be given appropriate interpretations. 

Roughly speaking, these statements involve a certain assignment of probability. The 
word “random” expresses our desire to assign equal probability to sets of equal lengths, 
areas, or volumes. Let 2 C ®,, be a given set, and A be a subset of 2. We are interested in 
the probability that a “randomly chosen point” in 2 falls in A. Here “randomly chosen” 
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means that the point may be any point of 2 and that the probability of its falling in some 
subset A of ( is proportional to the measure of A (independently of the location and shape 
of A). Assuming that both A and 2 have well-defined finite measures (length, area, volume, 
etc.), we define 
re measure(A) 

measure(Q2) 
(In the language of measure theory we are assuming that (2) is a measurable subset of R,, 
that has a finite, positive Lebesque measure. If A is any measurable set, PA = ~(A)/u(Q), 
where ,1 is the n-dimensional Lebesque measure.) Thus, if a point is chosen at random 
from the interval (a,b), the probability that it lies in the interval (c,d),a<c<d<b, 
is (d—c)/(b—a). Moreover, the probability that the randomly selected point lies in any 
interval of length (d—c) is the same. 

We present some examples. 


Example 6. A point is picked “at random” from a unit square. Let Q = {(x,y): O<x<1, 
0 <y< 1}. It is clear that all rectangles and their unions must be in 8. So too should all 
circles in the unit square, since the area of a circle is also well defined. Indeed, every set 
that has a well-defined area has to be in S. We choose § = 9, the Borel o-field generated 
by rectangles in Q. As for the probability assignment, if A € 5, we assign PA to A, where 
PA is the area of the set A. If A = {(x,y): 0< x < 1/2,1/2<y< 1}, then PA = 1/4. If 
Bis a circle with center (1/2, 1/2) and radius 1/2, then PB = (1/2)? = 7/4. If C is the 
set of all points which are at most a unit distance from the origin, then PC = 7/4 (see 
Figs. 1-3). 


Example 7 (Buffon’s Needle Problem). We return to Example 1.2.9. A needle (rod) of 
length / is tossed at random on a plane that is ruled with a series of parallel lines at distance 


(0,1) (1,1) 


(0,0) (1,0) : 
x 


Fig.l A={(x,y):0<x<1/2,1/2<y <I}. 
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»| 


(0,1) (1.1) 


(0,0) (1,0) 


Fig.2 B= {(x,y) : (1/2)? +(—1/2)? = 1}. 


(0,1) (1,1) 


vy 


(0,0) (1,0) x 
Fig.3 C= {(x,y):(?+y <1}. 


21 apart. We wish to find the probability that the needle will intersect one of the lines. 
Denoting by r the distance from the center of the needle to the closest line and by @ the 
angle that the needle forms with this line, we see that a necessary and sufficient condition 
for the needle to intersect the line is that r < (//2)sin@. The needle will intersect the 
nearest line if and only if its center falls in the shaded region in Fig. 1.2.2. We assign 
probability to an event A as follows: 


PA= area of set A 
It 
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Thus the required probability is 


1 PL npde=1 

In Jo 2 7 
Here we have interpreted “‘at random” to mean that the position of the needle is character- 
ized by a point (r,@) which lies in the rectangle 0 <r < 1,0 < 0 < 7. We have assumed 
that the probability that the point (r,@) lies in any arbitrary subset of this rectangle is pro- 
portional to the area of this set. Roughly, this means that “all positions of the midpoint of 
the needle are assigned the same weight and all directions of the needle are assigned the 
same weight.” 


Example 8. An interval of length 1, say (0, 1), is divided into three intervals by choosing 
two points at random. What is the probability that the three line segments form a triangle? 
It is clear that a necessary and sufficient condition for the three segments to form a 
triangle is that the length of any one of the segments be less than the sum of the other two. 
Let x,y be the abscissas of the two points chosen at random. Then we must have either 


1 1 
<a g<yst and YrX< 5 
or 


1 1 
O<y<i<x<l d x-y<s 
y 5) X an x y 5) 


This is precisely the shaded area in Fig. 4. It follows that the required probability is 1/4. 
If it is specified in advance that the point x is chosen at random from (0, 1/2), and the 
point y at random from (1/2, 1), we must have 


14 
CSS, Set, 
ies Uae acd 


(0,1) (1,1) 


(0,0) 


(1,0) x 
Fig.4 {(x,y):0<x<1/2<y<l,and(y—x) <1/20r0<y<1/2<x< land (x—y) < 1/2}. 


PROBABILITY AXIOMS 17 


and 
y-x<x+l—-y or 2(y—x) <1. 


In this case the area bounded by these lines is the shaded area in Fig. 5, and it follows that 
the required probability is 1/2. 
Note the difference in sample spaces in the two computations made above. 


Example 9 (Bertrand’s Paradox). A chord is drawn at random in the unit circle. What is 
the probability that the chord is longer than the side of the equilateral triangle inscribed in 
the circle? 

We present here three solutions to this problem, depending on how we interpret the 
phrase “at random.” The paradox is resolved once we define the probability spaces 
carefully. 


SOLUTION |. Since the length of a chord is uniquely determined by the position of 
its midpoint, choose a point C at random in the circle and draw a line through C and O, 
the center of the circle (Fig. 6). Draw the chord through C perpendicular to the line OC. 
If J; is the length of the chord with C as midpoint, /; > 3 if and only if C lies inside the 
circle with center O and radius 1/2. Thus PA = 1(1/2)?/m = 1/4. 

In this case 2) is the circle with center O and radius 1, and the event A is the concentric 
circle with center O and radius 5. § is the usual Borel o-field of subsets of 2. 


SOLUTION 2. Because of symmetry, we may fix one end point of the chord at some 
point P and then choose the other end point P; at random. Let the probability that P; lies 
on an arbitrary arc of the circle be proportional to the length of this arc. Now the inscribed 
equilateral triangle having P as one of its vertices divides the circumference into three 


/2 x 
Fig.5  {(x,y):0<x< 1/2, 1/2<y< land 2(y—x) < 1}. 
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Y 


Fig. 6 


Fig. 7 


equal parts. A chord drawn through P will be longer than the side of the triangle if and 
only if the other end point P, (Fig. 7) of the chord lies on that one third of the circumference 
that is opposite to P. It follows that the required probability is 1/3. In this case 2 = [0,27], 
§ = 8, NQ and A = [27/3,47/3]. 


SOLUTION 3. Note that the length of a chord is uniquely determined by the distance of 
its midpoint from the center of the circle. Due to the symmetry of the circle, we assume that 
the midpoint of the chord lies on a fixed radius, OM, of the circle (Fig. 8). The probability 
that the midpoint M lies in a given segment of the radius through M is then proportional 
to the length of this segment. Clearly, the length of the chord will be longer than the side 
of the inscribed equilateral triangle if the length of OM is less than radius/2. It follows 
that the required probability is 1/2. 


PROBABILITY AXIOMS 19 


Fig. 8 


PROBLEMS 1.3 


1. Let 2 be the set of all nonnegative integers and § the class of all subsets of 2. In 
each of the following cases does P define a probability on (0,8)? 


(a) For A € 5, let 


—X\x 
PA=)-° am ASO: 


xX: 
xXEA 


(b) For A € §, let 


PA=S p(l-p)*, 0<p<1. 
XE€A 
(c) For A € §, let PA = 1 if A has a finite number of elements, and PA = 0 otherwise. 
2. Let Q = ® and § = 5. In each of the following cases does P define a probability on 
(Q,8)? 
(a) For each interval J, let 


1 1 
Pra [~ de, 
7m L+x7 


(b) For each interval J, let PJ = 1 if J is an interval of finite length and PJ = 0 if J is 
an infinite interval. 
(c) For each interval J, let PI = 0 if I C (—o0, 1) and PI = [,(1/2) dx if I C [1,00]. 
(If 7 = 1, +h, where I; C (—oo, 1) and hh C [1, 00), then PI = Ph.) 
3. Let A and B be two events such that B D A. What is P(AUB)? What is P(ANB)? 
What is P(A — B)? 
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4. In Problems 1(a) and (b), let A = {all integers > 2}, B = {all nonnegative 
integers < 3}, and C = {all integers x, 3 < x < 6}. Find PA, PB, PC, P(AMB), 
P(AUB), P(BUC), P(ANC), and P(BNC). 

5. In Problem 2(a) let A be the event A = {x: x > 0}. Find PA. Also find P{x: x > 0}. 


6. A box contains 1000 light bulbs. The probability that there is at least 1 defective bulb 
in the box is 0.1, and the probability that there are at least 2 defective bulbs is 0.05. 
Find the probability in each of the following cases: 


(a) The box contains no defective bulbs. 
(b) The box contains exactly | defective bulb. 
(c) The box contains at most 1 defective bulb. 

7. Two points are chosen at random on a line of unit length. Find the probability that 
each of the three line segments so formed will have a length >1/4. 

8. Find the probability that the sum of two randomly chosen positive numbers (both 
<1) will not exceed 1 and that their product will be <2/9. 

9. Prove Theorem 3. 

10. Let {A,,} be a sequence of events such that A,, > A as n + oo. Show that PA, — PA 
as n—> oo. 

11. The base and the altitude of a right triangle are obtained by picking points ran- 
domly from [0,a] and [0,5], respectively. Show that the probability that the area 
of the triangle so formed will be less than ab/4 is (1 + én 2) /2. 

12. A point X is chosen at random on a line segment AB. (i) Show that the probability 
that the ratio of lengths AX /BX is smaller than a (a > 0) is a/(1 +a). (ii) Show that 
the probability that the ratio of the length of the shorter segment to that of the larger 
segment is less than 1/3 is 1/2. 


1.4 COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 


In this section we restrict attention to sample spaces that have at most a finite number of 
points. Let Q = {w),w2,...,W,} and S be the o-field of all subsets of 2. For any A € 8, 


PA= S~ P{uj}. 


wiEA 


Definition 1. An assignment of probability is said to be equally likely (or uniform) if each 
elementary event in 2 is assigned the same probability. Thus, if 2 contains n points wy, 
P{wj} =1/n,j =1,2,...,n. 


With this assignment 


number of elementary events in A 


(1) 


~ total number of elementary events in Q” 


Example 1. A coin is tossed twice. The sample space consists of four points. Under the 
uniform assignment, each of four elementary events is assigned probability 1/4. 
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Example 2. Three dice are rolled. The sample space consists of 6° points. Each one-point 
set is assigned probability 1/6°. 


In games of chance we usually deal with finite sample spaces where uniform proba- 
bility is assigned to all simple events. The same is the case in sampling schemes. In such 
instances the computation of the probability of an event A reduces to a combinatorial 
counting problem. We therefore consider some rules of counting. 


Rule 1. Given a collection of n; elements a1), 4)2,...,@in,, M2 elements az, 22,.--, dr, 
and so on, up to nz elements aj) ,442,.--,Akn,, it is possible to form nm, -nz----- nx ordered 
k-tuples (a4), ,d2j),.-- + kj, ) containing one element of each kind, 1 <j; <nj,i=1,2,...,k. 


Example 3. Here r distinguishable balls are to be placed inn cells. This amounts to choos- 
ing one cell for each ball. The sample space consists of n" r-tuples (i, i2,...,i,-), where i; 
is the cell number of the jth ball, j = 1,2,...,7, (1 <ij <n). 

Consider r tossings with a coin. There are 2” possible outcomes. The probability that 
no heads will show up in r throws is (1/2)". Similarly, the probability that no 6 will turn 
up in r throws of a die is (5/6)’. 


Rule 2 is concerned with ordered samples. Consider a set of n elements d1,d2,...,dn- 
Any ordered arrangement (a;,,q;,,...,a;,) of r of these n symbols is called an ordered 
sample of size r. If elements are selected one by one, there are two possibilities: 


1. Sampling with replacement In this case repetitions are permitted, and we can draw 
samples of an arbitrary size. Clearly there are n” samples of size r. 


2. Sampling without replacement In this case an element once chosen is not replaced, 
so that there can be no repetitions. Clearly the sample size cannot exceed n, the size 
of the population. There are n(n — 1)---(n —r+1) =,P,, say, possible samples of 
size r. Clearly ,P, = 0 for integers r > n. If r=n, then ,P, =n!. 


Rule 2. If ordered samples of size r are drawn from a population of n elements, there are 
n’ different samples with replacement and ,,P,. samples without replacement. 


Corollary. The number of permutations of n objects is n!. 


Remark I. We will frequently use the term “random sample” in this book to describe the 
equal assignment of probability to all possible samples in sampling from a finite popula- 
tion. Thus, when we speak of a random sample of size r from a population of n elements, 
it means that each of n’ samples, in sampling with replacement, has the same probability 
1/n" or that each of ,,P, samples, in sampling without replacement, is assigned probability 
1/,Py. 


Example 4. Consider a set of n elements. A sample of size r is drawn at random with 
replacement. Then the probability that no element appears more than once is clearly 
P,[n. 

nm r 
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Thus, if 7 balls are to be randomly placed in n cells, the probability that each cell will 
be occupied is n!/n”. 


Example 5. Consider a class of r students. The birthdays of these r students form a sample 
of size r from the 365 days in the year. Then the probability that all r birthdays are different 
is 365P,/(365)’. One can show that this probability is <1/2 if r = 23. 

The following table gives the values of g, = 365P,-/(365)’ for some selected values of r. 


r | 20 93 25 30 35 60 
qg, | 0.589 0.493 0.431 0.294 0.186 0.006 


Next suppose that each of the r students is asked for his birth date in order, with the 
instruction that as soon as a student hears his birth date he is to raise his hand. Let us 
compute the probability that a hand is first raised when the kth (k = 1,2,...,r) student 
is asked his birth date. Let pz be the probability that the procedure terminates at the kth 


student. Then 
_,_ (364 = 
ial 365 
and 


r—k+1 r—k 
365PK-1 k-1 365 —k 
= 1 ii ba PS ont 
ie aaa =) Ge) F ea 


Example 6. Let { be the set of all permutations of n objects. Let A; be the set of all permu- 
tations that leave the ith object unchanged. Then the set U'_,A; is the set of permutations 
with at least one fixed point. Clearly 


P(A; NAj) = 


By Theorem 1.3.3 we have 


‘ 1 1 1 
(Ua) =(1 mt 31 wa). 
i=1 


As an application consider an absent-minded secretary who places n letters in n 
envelopes at random. Then the probability that she will misplace every letter is 


1 1 1 
1-(1-g4g-e 5). 


It is easy to see that this last probability —> e~! = 0.3679 as n + 00. 
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Rule 3. There are (") different subpopulations of size r <n from a population of n 


elements, where 
n\ n! (2) 
r) r(n—r)! 


Example 7. Consider the random distribution of r balls in n cells. Let A; be the event that 
a specified cell has exactly k balls, k =0,1,2,...,7; k balls can be chosen in Ch ways. We 
place k balls in the specified cell and distribute the remaining r — k balls in the n — 1 cells 
in (n—1)"~* ways. Thus 


mam (= (G) 0-8) 


Example 8. There are (};) = 635,013,559,600 different hands at bridge, and (*°) = 
2,598,960 hands at poker. 
The probability that all 13 cards in a bridge hand have different face values is 4'> / ee : 
The probability that a hand at poker contains five different face values is es )49 / ge ). 


Rule 4. Consider a population of n elements. The number of ways in which the population 


can be partitioned into k subpopulations of sizes r),72,...,7%, respectively, 7 + r2 -+---+ 
rp =n, O< 7; <n, is given by 
n n! 
ee ere (3) 
T1510, -+5 51k | da Te Peat OF 


The numbers defined in (3) are known as multinomial coefficients. 


Proof. For the proof of Rule 4 one uses Rule 3 repeatedly. Note that 


weer a | Ce 
TY,12,+++5Tk r| r2 Tk-1 


Example 9. Ina game of bridge the probability that a hand of 13 cards contains 2 spades, 
7 hearts, 3 diamonds, and | club is 


13\ /13)\ /13\ (13 
2 7 3 1 
52 ; 
13 
Example 10. Anurn contains 5 red, 3 green, 2 blue, and 4 white balls. A sample of size 8 


is selected at random without replacement. The probability that the sample contains 2 red, 
2 green, | blue, and 3 white balls is 
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PROBLEMS 1.4 


1. 


10. 


How many different words can be formed by permuting letters of the word “Missis- 
sippi’”? How many of these start with the letters “Mi”? 


. An urn contains R red and W white marbles. Marbles are drawn from the urn one 


after another without replacement. Let A; be the event that a red marble is drawn for 
the first time on the kth draw. Show that 


k-1 
R R 
ey ene ae a 
. eae et} 


Let p be the proportion of red marbles in the urn before the first draw. Show that 
PA; — p(1—p)*-! as R+ W > oo. Is this to be expected? 


. Ina population of N elements, R are red and W = N — R are white. A group of n 


elements is selected at random. Find the probability that the group so chosen will 
contain exactly r red elements. 


. Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a six-digit number. If the 


numbers corresponding to all possible permutations are listed in increasing order of 
magnitude, find the 319th number on this list. 


. The numbers 1,2,...,7 are arranged in random order. Find the probability that the 


digits 1,2,...,k (k <n) appear as neighbors in that order. 


. A pin table has seven holes through which a ball can drop. Five balls are played. 


Assuming that at each play a ball is equally likely to go down any one of the seven 
holes, find the probability that more than one ball goes down at least one of the holes. 


. If 2n boys are divided into two equal subgroups find the probability that the two 


tallest boys will be (a) in different subgroups and (b) in the same subgroup. 


. In a movie theater that can accommodate n+ k people, n people are seated. What is 


the probability that r <n given seats are occupied? 


. Waiting in line for a Saturday morning movie show are 2n children. Tickets are 


priced at a quarter each. Find the probability that nobody will have to wait for change 
if, before a ticket is sold to the first customer, the cashier has 2k (k < n) quarters. 
Assume that it is equally likely that each ticket is paid for with a quarter or a half- 
dollar coin. 

Each box of a certain brand of breakfast cereal contains a small charm, with k distinct 
charms forming a set. Assuming that the chance of drawing any particular charm is 
equal to that of drawing any other charm, show that the probability of finding at least 
one complete set of charms in a random purchase of N > k boxes equals 
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11. 
12. 


13. 


14. 


15. 


16. 


17. 


HOF) +O) -O) 
lon 


[Hint: Use (1.3.6).] 

Prove Rules 1-4. 

In a five-card poker game, find the probability that a hand will have: 
(a) A royal flush (ace, king, queen, jack, and 10 of the same suit). 


(b) A straight flush (five cards in a sequence, all of the same suit; ace is high but A, 
2, 3, 4, 5 is also a sequence) excluding a royal flush. 

(c) Four of a kind (four cards of the same face value). 

(d) A full house (three cards of the same face value x and two cards of the same face 
value y). 

(e) A flush (five cards of the same suit excluding cards in a sequence). 

(f) A straight (five cards in a sequence). 

(g) Three of a kind (three cards of the same face value and two cards of different 
face values). 

(h) Two pairs. 

(i) A single pair. 

(a) A married couple and four of their friends enter a row of seats in a concert hall. 
What is the probability that the wife will sit next to her husband if all possible 
seating arrangements are equally likely? 

(b) In part (a), suppose the six people go to a restaurant after the concert and sit at 
around table. What is the probability that the wife will sit next to her husband? 

Consider a town with N people. A person sends two letters to two separate people, 

each of whom is asked to repeat the procedure. Thus for each letter received, two 

letters are sent out to separate persons chosen at random (irrespective of what hap- 
pened in the past). What is the probability that in the first n stages the person who 
started the chain letter game will not receive a letter? 

Consider a town with N people. A person tells a rumor to a second person, who in 

turn repeats it to a third person, and so on. Suppose that at each stage the recipient 

of the rumor is chosen at random from the remaining N — | people. What is the 
probability that the rumor will be repeated n times 

(a) Without being repeated to any person. 

(b) Without being repeated to the originator. 

There were four accidents in a town during a seven-day period. Would you be sur- 

prised if all four occurred on the same day? Each of the four occurred on a different 

day? 

While Rules | and 2 of counting deal with ordered samples with or without replace- 

ment, Rule 3 concerns unordered sampling without replacement. The most difficult 

rule of counting deals with unordered with replacement sampling. Show that there 
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are (on 4 possible unordered samples of size r from a population of n elements 
when sampled with replacement. 


1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM 


So far, we have computed probabilities of events on the assumption that no information 
was available about the experiment other than the sample space. Sometimes, however, 
it is known that an event H has happened. How do we use this information in mak- 
ing a statement concerning the outcome of another event A? Consider the following 
examples. 


Example 1. Let urn | contain one white and two black balls, and urn 2, one black and two 
white balls. A fair coin is tossed. If a head turns up, a ball is drawn at random from urn | 
otherwise, from urn 2. Let E be the event that the ball drawn is black. The sample space 
is Q = {Hb,,, Hbi2, Hwy, Tho), Two, Tw22}, where H denotes head, T denotes tail, bj 
denotes jth black ball in ith urn, i = 1,2, and so on. Then 


PE = P{Hby1,Hb12, Tha} = 2 = 4. 


If, however, it is known that the coin showed a head, the ball could not have been drawn 
from urn 2. Thus, the probability of E, conditional on information H, is z Note that this 
probability equals the ratio P{Head and ball drawn black} /P{Head}. 


Example 2. Let us toss two fair coins. Then the sample space of the experiment is 2. = 
{HH,HT,TH,TT}. Let event A = {both coins show same face} and B = {at least one 
coin shows H}. Then PA = 2/4. If B is known to have happened, this information assures 
that TT cannot happen, and P{A conditional on the information that B has happened} = 
5 = 4/3 =P(ANB)/PB. 


Definition 1. Let (0,8,P) be a probability space, and let H € S with PH > 0. For an 
arbitrary A € S we shall write 


P(ANH) 


P{A|H} = 


(1) 


and call the quantity so defined the conditional probability of A, given H. Conditional 
probability remains undefined when PH = 0. 


Theorem 1. Let (2,8,P) be a probability space, and let H € & with PH > 0. Then 
(Q,8,P), where P(A) = P{A | H} for all A € 8, is a probability space. 


Proof. Clearly Py(A) = P{A | H} > 0 for all A € 8. Also, Py(Q) = P(QNH)/PH = 1. 
If Aj,A2,... is a disjoint sequence of sets in §, then 
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~ P{(Or ANNA} 
(Se) rf Sao} ea 
_ Vie PAIN #) 
=) PulAi): 


Remark I. What we have done is to consider a new sample space consisting of the basic 
set H and the o-field 5y = SNH, of subsets ANH, A € S, of H. On this space we have 
defined a set function Py by multiplying the probability of each event by (PH)~'. Indeed, 
(H, 8, P1) is a probability space. 


Let A and B be two events with PA > 0, PB > 0. Then it follows from (1) that 
P(AMB) = PA- P{B | A}, 
(2) 
P(AMB) = PB: P{A | B}. 


Equation (2) may be generalized to any number of events. Let A),A2,...,A, € 8, > 2, 
and assume that P(();— | A;) > 0. Since 


n—1 


Ay > (A142) D (A1MA2MA3) D (As > (4 


we see that 
n—2 
PA, >0, P(A;MA2) > “ 


It follows that P{ A, | na | Aj} are well defined for k = 2,3,.. 


Theorem 2 (The Multiplication Rule). Let (0,8, P) be a probability space and A,,Ao,..., 
An € 8, with P(='Aj) > 0. Then 


Ans han A,)P{Ap | Ar }P{A3 | A, Ao}: as | Aa \ G3) 


j=l 


Proof. The proof is simple. 


Let us suppose that {H;} is a countable collection of events in $ such that Hj H; = ®, 
J#k, and el H; = Q. Suppose that PH; > 0 for all j. Then 


PB= yr )P{B|H;} forall BES. (4) 
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For the proof we note that 


B=) -(BNH)) 


j= 


and the result follows. Equation (4) is called the total probability rule. 


Example 3. Consider a hand of five cards in a game of poker. If the cards are dealt at 
random, there are () possible hands of five cards each. Let A = {at least 3 cards of 
spades} and B = {all 5 cards of spades}. Then 


P(ANMB) = P{all 5 cards of spades} 


52 
5 


and 


13 52 
Bs 5 
13) (39) , (13) (39) | (13 52\ 
3 2 4 1 5 ) 
Example 4. Urn 1 contains one white and two black marbles, urn 2 contains one black 
and two white marbles, and urn 3 contains three black and three white marbles. A die is 
rolled. If a 1, 2, or 3 shows up, urn | is selected; if a 4 shows up, urn 2 is selected; and if 
a5 or 6 shows up, urn 3 is selected. A marble is then drawn at random from the selected 


urn. Let A be the event that the marble drawn is white. If U, V, W, respectively, denote the 
events that the urn selected is 1, 2, 3, then 


A=(ANU)+(ANV)+(ANW), 


It follows that 


A simple consequence of the total probability rule is the Bayes rule, which we now 
prove. 
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Theorem 3 (Bayes Rule). Let {H,,} be a disjoint sequence of events such that PH,, > 0, 
n=1,2,..., and °°, H, =. Let B € 8 with PB > 0. Then 


P(H))P{B | Hi} 
>>, PU) PLB | Hi} 


P{H; | B} = ee (5) 


Proof. From (2) 
P{BO Hj} = P(B)P{H; | B} = PH)P{B | Hj}, 


and it follows that 


PH)P{B | Hj} 


P{H; | B}= 


The result now follows on using (4). 


Remark 2. Suppose that H,,H>,... are all the “causes” that lead to the outcome of a ran- 
dom experiment. Let Hj be the set of outcomes corresponding to the jth cause. Assume 
that the probabilities PH;, j = 1,2,..., called the prior probabilities, can be assigned. Now 
suppose that the experiment results in an event B of positive probability. This information 
leads to a reassessment of the prior probabilities. The conditional probabilities P{H; | B} 
are called the posterior probabilities. Formula (5) can be interpreted as a rule giving the 
probability that observed event B was due to cause or hypothesis Hj. 


Example 5. In Example 4 let us compute the conditional probability P{V | A}. 
We have 


PVP{A|V} 


PUP{A| U}+PVP{A | V} +PWP{A| Wt 
1 


P{V|A}= 


l 
7 


PROBLEMS 1.5 


1. Let A and B be two events such that PA = p; > 0, PB = po > 0, and pj + pz > 1. 
Show that P{B | A} > 1—[(1—p2)/pi]. 

2. Two digits are chosen at random without replacement from the set of integers 
{1,2,3,4,5,6,7, 8}. 
(a) Find the probability that both digits are greater than 5. 
(b) Show that the probability that the sum of the digits will be equal to 5 is the same 

as the probability that their sum will exceed 13. 

3. The probability of a family chosen at random having exactly k children is ap*, 0 < 

p <1. Suppose that the probability that any child has blue eyes is b, 0 <b < 1, 
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11. 


12. 
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independently of others. What is the probability that a family chosen at random has 
exactly r (r > 0) children with blue eyes? 


. In Problem 3 let us write 


px = probability of a randomly chosen family having exactly k children = ap*, 
Ka 1), 250085 
ap 
(1—p) 


Suppose that all sex distributions of k children are equally likely. Find the probability 
that a family has exactly r boys, r > 1. Find the conditional probability that a family 
has at least two boys, given that it has at least one boy. 


po=1- 


. Each of (N + 1) identical urns marked 0,1,2,...,N contains N balls. The kth urn 


contains k black and N — k white balls, k =0,1,2,...,N. An urn is chosen at random, 
and n random drawings are made from it, the ball drawn being always replaced. If 
all the n draws result in black balls, find the probability that the (n+ 1)th draw will 
also produce a black ball. How does this probability behave as N — oo? 


. Each of n urns contains four white and six black balls, while another urn contains 


five white and five black balls. An urn is chosen at random from the (n+ 1) urns, 
and two balls are drawn from it, both being black. The probability that five white 
and three black balls remain in the chosen urn is 1/7. Find n. 


. In answering a question on a multiple choice test, a candidate either knows the 


answer with probability p (0 < p < 1) or does not know the answer with probability 
1 — p. Ifhe knows the answer, he puts down the correct answer with probability 0.99, 
whereas if he guesses, the probability of his putting down the correct result is 1/k 
(k choices to the answer). Find the conditional probability that the candidate knew 
the answer to a question, given that he has made the correct answer. Show that this 
probability tends to 1 as k + oo. 


. An urn contains five white and four black balls. Four balls are transferred to a sec- 


ond urn. A ball is then drawn from this urn, and it happens to be black. Find the 
probability of drawing a white ball from among the remaining three. 


. Prove Theorem 2. 
10. 


An urn contains r red and g green marbles. A marble is drawn at random and its 
color noted. Then the marble drawn, together with c > 0 marbles of the same color, 
are returned to the urn. Suppose n such draws are made from the urn? Find the 
probability of selecting a red marble at any draw. 

Consider a bicyclist who leaves a point P (see Fig. 1), choosing one of the roads 
PR, PR2, PR3 at random. At each subsequent crossroad he again chooses a road at 
random. 

(a) What is the probability that he will arrive at point A? 

(b) What is the conditional probability that he will arrive at A via road PR3? 

Five percent of patients suffering from a certain disease are selected to undergo a 


new treatment that is believed to increase the recovery rate from 30 percent to 50 
percent. A person is randomly selected from these patients after the completion of 
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R34 


Fig. 1 Map for Problem 11. 


the treatment and is found to have recovered. What is the probability that the patient 
received the new treatment? 

13. Four roads lead away from the county jail. A prisoner has escaped from the jail and 
selects a road at random. If road I is selected, the probability of escaping is 1/8; 
if road II is selected, the probability of success is 1/6; if road III is selected, the 
probability of escaping is 1/4; and if road IV is selected, the probability of success 
is 9/10. 

(a) What is the probability that the prisoner will succeed in escaping? 


(b) If the prisoner succeeds, what is the probability that the prisoner escaped by 
using road IV? Road I? 

14. A diagnostic test for a certain disease is 95 percent accurate; in that if a person has 
the disease, it will detect it with a probability of 0.95, and if a person does not have 
the disease, it will give a negative result with a probability of 0.95. Suppose only 0.5 
percent of the population has the disease in question. A person is chosen at random 
from this population. The test indicates that this person has the disease. What is the 
(conditional) probability that he or she does have the disease? 


1.6 INDEPENDENCE OF EVENTS 


Let (2,8,P) be a probability space, and let A,B € 8, with PB > 0. By the multiplication 
rule we have 


P(ANB) = P(B)P{A | B}. 
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In many experiments the information provided by B does not affect the probability of 
event A, that is, P{A | B} = P{A}. 


Example 1. Let two fair coins be tossed, and let A = {head on the second throw}, 
B = {head on the first throw}. Then 


P(A) = P{HH,TH} =! 


and 


P{A|B} = oe = 


NIF| RIE 
| 
IR 
| 
= 
> 
~~ 


Thus 
P(AMB) = P(A) P(B). 
In the following, we will write AN B = AB. 
Definition 1. Two events, A and B, are said to be independent if and only if 
P(AB) = P(A) P(B). (1) 
Note that we have not placed any restriction on P(A) or P(B). Thus conditional prob- 
ability is not defined when P(A) or P(B) = 0 but independence is. Clearly, if P(A) = 0, 
then A is independent of every E € 8S. Also, any event A € S is independent of ® and Q. 
Theorem 1. If A and B are independent events, then 
P{A|B}=P(A) if P(B)>0 
and 
P{B|A}=P(B) if P(A) >0. 
Theorem 2. If A and B are independent, so are A and B°, A‘ and B, and A and B°. 


Proof. 


B)—P(ANMB) since BD (ANB) 


Similarly, one proves that (i) A° and BS and (ii) A and B* are independent. 
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We wish to emphasize that independence of events is not to be confused with disjoint 
or mutually exclusive events. If two events, each with nonzero probability, are mutually 
exclusive, they are obviously dependent since the occurrence of one will automatically 
preclude the occurrence of the other. Similarly, if A and B are independent and PA > 0, 
PB > 0, then A and B cannot be mutually exclusive. 


Example 2. A card is chosen at random from a deck of 52 cards. Let A be the event that 
the card is an ace and B, the event that it is a club. Then 


Pa)=4=4. PB)= B= 


P(AB) = P{ace of clubs} = 5, 
so that A and B are independent. 


Example 3. Consider families with two children, and assume that all four possible dis- 
tributions of sex—BB, BG, GB, GG, where B stands for boy and G for girl—are equally 
likely. Let E be the event that a randomly chosen family has at most one girl and F, the 
event that the family has children of both sexes. Then 


P(E)=3, P(F)=4, and P(EF)=}, 


so that E' and F are not independent. 
Now consider families with three children. Assuming that each of the eight possible 
sex distributions is equally likely, we have 


PE)\=%, P(P)=t, PEF) =}, 


so that E' and F are independent. 


An obvious extension of the concept of independence between two events A and B toa 
given collection LU of events is to require that any two distinct events in L be independent. 


Definition 2. Let L( be a family of events from S. We say that the events LU are pairwise 
independent if and only if, for every pair of distinct events A,B € LU, 


P(AB) = PAPB. 
A much stronger and more useful concept is mutual or complete independence. 
Definition 3. A family of events { is said to be a mutually or completely independent 


family if and only if, for every finite sub collection {A;,,A;,,...,A;,} of U, the following 
relation holds: 


k 
P(Ai, NA ++ Ai.) = | | PA; (2) 
j=l 
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In what follows we will omit the adjective “mutual” or “complete” and speak of inde- 
pendent events. It is clear from Definition 3 that in order to check the independence of n 


events A;,A2,...,A, € S we must check the following 2” —n— | relations. 
P(A,A;) = PA; PA;, LAY bf = Ms 2oaceglhy 
P(A,AjAx) = PA; PA; PAg, if~jHk, i,j,k = 1,2,...,n, 


P(AjA2-++An) = PA, PA2:++PAn. 


The first of these requirements is pairwise independence. Independence therefore implies 
pairwise independence, but not conversely. 


Example 4 (Wong [120]). Take four identical marbles. On the first, write symbols A,A2A3. 
On each of the other three, write A;, Az, A3, respectively. Put the four marbles in an urn 
and draw one at random. Let £; denote the event that the symbol A; appears on the drawn 
marble. Then 


P(E,) = P(E2) = P(E3) = 3, 
P(E\E2) = P(ExE3) = P(E: Es) = 4, 
and 
P(E, E2E3) = 4. (3) 


It follows that although events E,, Ej, E; are not independent, they are pairwise 
independent. 


Example 5 (Kac [48], pp. 22-23). In this example P(E, E2E3) = P(E,)P(E2)P(E3), but 
E\, E, E3 are not pairwise independent and hence not independent. Let 2 = {1,2,3,4}, 
and let p; be the probability assigned to {i}, i= 1,2,3,4. Let p) = va -i,p=7—R3= 
3_ V2 p=! Let Ey = {1,3}, Ep = {2,3}, Es = {3,4}. Then 


P(E\EpE3) =P3}=3 a! (: ?) (: ?) 


2 2 2 2 
= (p\ + p3)(p2 + p3)(P3 + pa) 
= P(E) P(E) P(E3). 


But P(E) Ez) = 3 _ v2 # PE, PE», and it follows that E;, E>, F3 are not independent. 
Example 6. A die is rolled repeatedly until a 6 turns up. We will show that event A, that 


“a 6 will eventually show up,” is certain to occur. Let A; be the event that a 6 will show up 
for the first time on the kth throw. Let A = )77~ , Ag. Then 


k-1 
Pi), We 
6 \6 
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and 


co k-1 
1 5 1 1 
Phas = = =, 
z>(3) si. 


k=1 


Alternatively, we can use the corollary to Theorem 1.3.6. Let B,, be the event that a 6 does 
not show up on the first 7 trials. Clearly B,4; C By, and we have A = N°, B,. Thus 


1—PA=PA° =P (A ® = lim P(B,) = lim (z) =0. 
n—-oco n—-oo 


n=1 


Example 7. A slip of paper is given to person A, who marks it with either a plus or a 
minus sign; the probability of her writing a plus sign is 1/3. A passes the slip to B, who 
may either leave it alone or change the sign before passing it to C. Next, C passes the slip 
to D after perhaps changing the sign; finally, D passes it to a referee after perhaps changing 
the sign. The referee sees a plus sign on the slip. It is known that B, C, and D each change 
the sign with probability 2/3. We shall compute the probability that A originally wrote a 
plus. 

Let N be the event that A wrote a plus sign, and M, the event that she wrote a minus 
sign. Let E be the event that the referee saw a plus sign on the slip. We have 


P(N)PLE | N} 
(M)P{E | M} + P(N)P{E | N} 


PIN | E}=5 


Now 


P{E | N} = P{the plus sign was either not changed or changed exactly twice} 


(3) (3) +G) 


and 


P{E | M} = P{the minus sign was changed either once or three times} 
yi 72 


It follows that 


P{N | E}= 
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PROBLEMS 1.6 


1. A biased coin is tossed until a head appears for the first time. Let p be the probability 
of a head, 0 < p < 1. What is the probability that the number of tosses required is 
odd? Even? 

2. Let A and B be two independent events defined on some probability space, and let 
PA = 1/3, PB = 3/4. Find (a) P(AUB), (b) P{A | AUB}, and (c) P{B | AUB}. 

3. Let Aj, Az, and A3 be three independent events. Show that Af, AS, and A§ are 
independent. 

4. A biased coin with probability p, 0 < p < 1, of success (heads) is tossed until for the 
first time the same result occurs three times in succession (i.e., three heads or three 
tails in succession). Find the probability that the game will end at the seventh throw. 

5. A box contains 20 black and 30 green balls. One ball at a time is drawn at random, 
its color is noted, and the ball is then replaced in the box for the next draw. 


(a) Find the probability that the first green ball is drawn on the fourth draw. 

(b) Find the probability that the third and fourth green balls are drawn on the sixth 
and ninth draws, respectively. 

(c) Let N be the trial at which the fifth green ball is drawn. Find the probability that 
the fifth green ball is drawn on the nth draw. (Note that N take values 5,6,7,....) 

6. An urn contains four red and four black balls. A sample of two balls is drawn at 
random. If both balls drawn are of the same color, these balls are set aside and a new 
sample is drawn. If the two balls drawn are of different colors, they are returned to 
the urn and another sample is drawn. Assume that the draws are independent and 
that the same sampling plan is pursued at each stage until all balls are drawn. 

(a) Find the probability that at least n samples are drawn before two balls of the 
same color appear. 

(b) Find the probability that after the first two samples are drawn four balls are left, 
two black and two red. 

7. Let A, B, and C be three boxes with three, four, and five cells, respectively. There are 
three yellow balls numbered | to 3, four green balls numbered | to 4, and five red 
balls numbered | to 5. The yellow balls are placed at random in box A, the green in 
B, and the red in C, with no cell receiving more than one ball. Find the probability 
that only one of the boxes will show no matches. 

8. A pond contains red and golden fish. There are 3000 red and 7000 golden fish, 
of which 200 and 500, respectively, are tagged. Find the probability that a random 
sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, respectively. 

9. Let (Q.,8,P) be a probability space. Let A, B, C € 8 with PB and PC > 0. If B and 
C are independent show that 


P{A | B} = P{A| BNC}PC+P{A | BNC }PC. 


Conversely, if this relation holds, P{A | BC} 4 P{A | B}, and PA > 0, then B and C 
are independent (Strait [111]). 
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10. 


11. 


12. 


13. 


Show that the converse of Theorem 2 also holds. Thus A and B are independent if, 
and only if, A and B° are independent, and so on. 

A lot of five identical batteries is life tested. The probability assignment is assumed 
to be 


P(A) = [amerPas 


for any event A C [0,co), where \ > 0 is a known constant. Thus the probability that 
a battery fails after time f is given by 


P(t,0o) -/ (ij Nje-*. dk, ¢> 0. 
t 


If the times to failure of the batteries are independent, what is the probability that at 
least one battery will be operating after f) hours? 

On 2 = (a,b), —co <a<b<, each subinterval is assigned a probability propor- 
tional to the length of the interval. Find a necessary and sufficient condition for two 
events to be independent. 

A game of craps is played with a pair of fair dice as follows. A player rolls the dice. 
If a sum of 7 or 11 shows up, the player wins; if a sum of 2, 3, or 12 shows up, the 
player loses. Otherwise the player continues to roll the pair of dice until the sum is 
either 7 or the first number rolled. In the former case the player loses and in the latter 
the player wins. 

(a) Find the probability that the player wins on the nth roll. 

(b) Find the probability that the player wins the game. 


(c) What is the probability that the game ends on: (i) the first roll, (ii) second roll, 
and (iii) third roll? 


RANDOM VARIABLES AND THEIR 
PROBABILITY DISTRIBUTIONS 


2.1 INTRODUCTION 


In Chapter | we dealt essentially with random experiments which can be described by 
finite sample spaces. We studied the assignment and computation of probabilities of 
events. In practice, one observes a function defined on the space of outcomes. Thus, if 
a coin is tossed n times, one is not interested in knowing which of the 2” n-tuples in the 
sample space has occurred. Rather, one would like to know the number of heads in 7 tosses. 
In games of chance one is interested in the net gain or loss of a certain player. Actually, in 
Chapter | we were concerned with such functions without defining the term random vari- 
able. Here we study the notion of a random variable and examine some of its properties. 

In Section 2.2 we define a random variable, while in Section 2.3 we study the notion 
of probability distribution of a random variable. Section 2.4 deals with some special types 
of random variables, and Section 2.5 considers functions of a random variable and their 
induced distributions. 

The fundamental difference between a random variable and a real-valued function of a 
real variable is the associated notion of a probability distribution. Nevertheless our know]- 
edge of advanced calculus or real analysis is the basic tool in the study of random variables 
and their probability distributions. 


2.2 RANDOM VARIABLES 
In Chapter 1 we studied properties of a set function P defined on a sample space (2,8). 


Since P is a set function, it is not very easy to handle; we cannot perform arithmetic or 
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algebraic operations on sets. Moreover, in practice one frequently observes some function 
of elementary events. When a coin is tossed repeatedly, which replication resulted in heads 
is not of much interest. Rather one is interested in the number of heads, and consequently 
the number of tails, that appear in, say, n tossings of the coin. It is therefore desirable to 
introduce a point function on the sample space. We can then use our knowledge of calculus 
or real analysis to study properties of P. 


Definition 1. Let (Q,8) be a sample space. A finite, single-valued function which maps 
Q into & is called a random variable (RV) if the inverse images under X of all Borel sets 
in & are events, that is, if 


X7'(B)={w: X(w)EB}ES8 forall BEB. (1) 


In order to verify whether a real-valued function on ((2,S) is an RV, it is not necessary 
to check that (1) holds for all Borel sets B € S. It suffices to verify (1) for any class 2l 
of subsets of R which generates %. By taking 2 to be the class of semiclosed intervals 
(—oo, x], x € R we get the following result. 


Theorem 1. X is an RV if and only if for each x EC R 
{w: X(w) <x} ={X¥ <x} ES. (2) 
Remark I, Note that the notion of probability does not enter into the definition of an RV. 


Remark 2. If X is an RV, the sets {X =x}, {a< X <b}, {X <x}, {a<X <b}, {a< 
X <b}, {a < X < b} are all events. Moreover, we could have used any of these intervals 
to define an RV. For example, we could have used the following equivalent definition: X 
is an RV if and only if 


{w: X(w) <x} ES for all x E R. (3) 


We have 


cpa U{xcx- 1h a) 


n=1 


and 


ix<ap=(\fxcstth. (5) 


n=1 


Remark 3. In practice (1) or (2) is a technical condition in the definition of an RV which 
the reader may ignore and think of RVs simply as real-valued functions defined on 2. It 
should be emphasized though that there do exist subsets of R which do not belong to 8 
and hence there exist real-valued functions defined on 2. which are not RVs but the reader 
will not encounter them in practical applications. 
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Example 1. For any set A C 2, define 


0, wA, 
1, weA. 


I,(w) is called the indicator function of set A. I, is an RV if and only if A € 8. 


Example 2. Let Q = {H,T} and 8 be the class of all subsets of 2. Define X by X(H) = 1, 
X(T) =0. Then 


@ ifx <0, 
X—'(—00,x] = ¢ {T} nO< e<1, 
{H,T} if1<x, 


and we see that X is an RV. 


Example 3. Let Q = {HH,TT,HT,TH} and § be the class of all subsets of 2. 
Define X by 


X(w) = number of H’s inw. 


Then X(HH) = 2, X(HT) = X(TH) = 1, and X(TT) =0. 


?, x <0, 
le jhe {TT}, 0<x<1, 
: {TT,HT,TH}, 1<x<2, 

Q, 2X: 


Thus X is an RV. 
Remark 4. Let (Q,8) be a discrete sample space; that is, let 2 be a countable set of points 
and S be the class of all subsets of 2. Then every numerical valued function defined on 


(Q,8) is an RV. 


Example 4. Let 2 = [0,1] and 8 = BN (0, 1] be the o-field of Borel sets on [0, 1]. Define 
X on 2 by 


X(w) =w, w € [0,1]. 
Clearly X is an RV. Any Borel subset of 2 is an event. 
Remark 5. Let X be an RV defined on (2,8) and a, b be constants. Then aX + b is also 


an RV on (©,8). Moreover, X? is an RV and so also is 1/X, provided that {X = 0} = ¢. 
For a general result see Theorem 2.5.1. 
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PROBLEMS 2.2 


1. Let X be the number of heads in three tosses of a coin. What is 22? What are the values 
that X assigns to points of (2? What are the events {X < 2.75}, {0.5 < X < 1.72}? 

2. A die is tossed two times. Let X be the sum of face values on the two tosses and Y 
be the absolute value of the difference in face values. What is 2? What values do X 
and Y assign to points of 92? Check to see whether X and Y are random variables. 

3. Let X be an RV. Is |X| also an RV? If X is an RV that takes only nonnegative values, 
is JX also an RV? 

4. A die is rolled five times. Let X be the sum of face values. Write the events {X = 4}, 
{X = 6}, {X = 30}, {X > 29}. 

5. Let 2 = [0, 1] and S be the Borel o-field of subsets of 2. Define X on Q as follows: 
X(w) =wif0<w< 1/2, and X(w) =w—1/2 if 1/2<w< 1. Is X an RV? If so, 
what is the event {w: X(w) € (1/4,1/2)}? 

6. Let 2l be a class of subsets of R which generates 8. Show that X is an RV on (2 if 
and only if X~!(A) € & for all A € . 


2.3 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 


In Section 2.2 we introduced the concept of an RV and noted that the concept of proba- 
bility on the sample space was not used in this definition. In practice, however, random 
variables are of interest only when they are defined on a probability space. Let (0,8, P) 
be a probability space, and let X be an RV defined on it. 


Theorem 1. The RV X defined on the probability space (0,5, P) induces a probability 
space (R®,8,Q) by means of the correspondence 


Q(B) = P{X~|(B)} = P{w: X(w)€B} — forall BE B. (1) 
We write Q = PX~! and call QO or PX~! the (probability) distribution of X. 
Proof. Clearly Q(B) > 0 for all B € B, and also Q(R) = P{X € R} = P(Q) = 1. 


Let B; € B, i= 1,2,... with B; 1B; = ¢, i #j. Since the inverse image of a disjoint 
union of Borel sets is the disjoint union of their inverse images, we have 


o(zea)orir'(Xa)} 


ne 


It follows that (R, 8, Q) is a probability space, and the proof is complete. 
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We note that Q is a set function, and set functions are not easy to handle. It is therefore 
more practical to use (2.2.2) since then Q(—o0o,.] is a point function. Let us first introduce 


and study some properties of a special point function on &. 


Definition 1. A real-valued function F defined on (—oo,0o) that is nondecreasing, right 
continuous, and satisfies 


F(—oo) =0 and F(+oo) = 1 
is called a distribution function (DF). 
Remark I. Recall that if F is a nondecreasing function on R, then F(x—) = lim, F(t), 
F(x+) = lim,), F(t) exist and are finite. Also, F(++oo) and F(—oo) exist as lims4.0. F(f) 
and lim, 40 F(t), respectively. In general, 


F(x—) < F(x) < F(x+), 


and x is a jump point of F if and only if F(x+) and F(x—) exist but are unequal. Thus a 
nondecreasing function F has only jump discontinuities. If we define 


F* (x) = F(x+) for all x, 
we see that F* is nondecreasing and right continuous on &. Thus in Definition | the non- 
decreasing part is very important. Some authors demand left continuity in the definition 
of a DF instead of right continuity. 
Theorem 2. The set of discontinuity points of a DF F is at most countable. 
Proof. Let (a,b be a finite interval with at least n discontinuity points: 

A<X <x. <0 Sy SD. 
Then 

F(a) < F(x—) < F(a1) $+ < Fler) < Fin) < F(b). 


Let py = F(x,) — F(xe—), kK = 1,2,...,n. Clearly, 


and it follows that the number of points x in (a,b] with jump p(x) > ¢ > 0 is at most 
e|{F(b) — F(a)}. Thus, for every integer N, the number of discontinuity points with 
jump greater than 1 /N is finite. It follows that there are no more than a countable number 
of discontinuity points in every finite interval (a,b]. Since ® is a countable union of such 
intervals, the proof is complete. 
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Definition 2. Let X be an RV defined on (Q,5,P). Define a point function F(.) on R by 
using (1), namely, 


F(x) = Q(—o0,x] = P{w: X(w) <x} for allx € R. (2) 
The function F is called the distribution function of RV X. 
If there is no confusion, we will write 
F(x) = P{X <x}. 
The following result justifies our calling F as defined by (2) a DF. 


Theorem 3. The function F defined in (2) is indeed a DF. 


Proof. Let x; < x2. Then (—oo,x;] C (—0o,x2], and we have 
F(x) = P{X < x1} < P{X <x} = F(x). 


Since F is nondecreasing, it is sufficient to show that for any sequence of numbers x, | x, 
Xp > XQ > > Xp > > x, F(X_) 3 F(x). Let Ag = {w: X(w) € (x, x;]}. Then Ag € S$ 
and A; ¥. Also, 


ten ee 


since none of the intervals (x,x;] contains x. It follows that lim... P(A,) = 0. But, 
P(A) = P{X < x, }— P{X <x} 
= F(xx) — F(x), 
so that 
lim F(x.) = F(x) 
k—-0o0 
and F is right continuous. 
Finally, let {x,} be a sequence of numbers decreasing to —oo. Then, 


{X <x} D {X < xpi} for each n 


and 
im {4X < = X< = 
a {X <x,} kt <x} =¢. 


Therefore, 


F(—00) = lim P{X <x} =P{ lim {X <x} }=0, 


n—-co 
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Similarly, 

F (+00) = lim P{x <n} =1, 
and the proof is complete. 


The next result, stated without proof, establishes a correspondence between the induced 
probability Q on (R,%S) and a point function F defined on &. 


Theorem 4. Given a probability Q on (R,%), there exists a distribution function F 
satisfying 


Q(—00,x] = F(x) for allx € R, (3) 


and, conversely, given a DF F, there exists a unique probability Q defined on (R, 8) that 
satisfies (3). 


For proof see Chung [15, pp. 23-24]. 


Theorem 5. Every DF is the DF of an RV on some probability space. 


Proof. Let F be a DF. From Theorem 4 it follows that there exists a unique probability Q 
defined on & that satisfies 


Q(—, x] = F(x) for allxe RX. 
Let (R, 8, Q) be the probability space on which we define 
X(w) =u, wER. 
Then 
O{w: X(w) <x} = Q(-00,2] = F(x), 
and F is the DF of RV X. 
Remark 2. If X is an RV on (2,8,P), we have seen (Theorem 3) that F(x) = P{X <x}isa 
DF associated with X. Theorem 5 assures us that to every DF F we can associate some RV. 
Thus, given an RV, there exists a DF, and conversely. In this book, when we speak of 
an RV we will assume that it is defined on some probability space. 


Example 1. Let X be defined on (Q,8,P) by 


X(w) =c for allw EQ. 
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Then 


P{X=c}=1, 
F(x) = Q(—00,2| = P{X—"(—00,2]} =0 ifx<c 


and 
F(x) =1 ifx>c. 


Example 2. Let Q = {H,T} and X be defined by 


If P assigns equal mass to {H} and {T}, then 


P{X =0} ; P{X=1} 


and 
0, x<0, 
F(x) = O(-co,4]= 94, O<x<1, 
1, Il<x. 


Example 3. Let Q= {(i,j): i,j € {1,2,3,4,5,6}} and S be the set of all subsets of 2. Let 
P{(i,j)} = 1/6? for all 6? pairs (i,j) in Q. Define 


X(i,)=i+j, 151, j<6. 


Then, 
0, cD. 
aa 2<x<3, 
= 3<x<4, 
F(x) = Q(—o0,x] = P{X <x} = £, 4<x<5, 
5, i11<x<12, 
I, ID x, 


Example 4. We return to Example 2.2.4. For every subinterval J of [0,1] let P(7) be the 
length of the interval. Then (2,5, P) is a probability space, and the DF of RV X(w) =w, 
w € is given by F(x) =O if x < 0, F(x) = P{w: X(w) < x} = P([0,x]) =x if x € [0,1], 
and F(x) = lifx> 1. 
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PROBLEMS 2.3 


1. Write the DF of RV X defined in Problem 2.2.1, assuming that the coin is fair. 
2. What is the DF of RV Y defined in Problem 2.2.2, assuming that the die is not loaded? 
3. Do the following functions define DFs? 
(a) F(x) =0ifx<0,=xif0<x<1/2,and=1lifx> 5. 
(b) F(x) = (1/m)tan—!x, —00 < x < 00. 
(c) F(x) =Oifx < 1, and = 1—(1/x) if 1 <-. 
(d) F(x) =1—e“*ifx>0, and =0ifx <0. 
4. Let X be an RV with DF F. 
(a) If F is the DF defined in Problem 3(a), find P{X > 4}, P{3 <X < 3}. 
(b) If F is the DF defined in Problem 3(d), find P{—oo < X < 2}. 


2.4 DISCRETE AND CONTINUOUS RANDOM VARIABLES 


Let X be an RV defined on some fixed, but otherwise arbitrary, probability space (0,8, P), 
and let F be the DF of X. In this book, we shall restrict ourselves mainly to two cases, 
namely, the case in which the RV assumes at most a countable number of values and 
hence its DF is a step function and that in which the DF F is (absolutely) continuous. 


Definition 1. An RV X defined on (,8,P) is said to be of the discrete type, or simply 
discrete, if there exists a countable set E C ® such that P{X € E} = 1. The points of E 
which have positive mass are called jump points or points of increase of the DF of X, and 
their probabilities are called jumps of the DF. 

Note that E € % since every one-point is in 8. Indeed, if x € ®, then 


Sep ae), 6) 


n=1 


Thus {X © EF} is an event. Let X take on the value x; with probability p; (i = 1,2,...). 
We have 


P{w: X(w) =x;} = pi, i=1,2,...,  p; > 0 for alli. 
Then 5°, pi =1. 


Definition 2. The collection of numbers {p;} satisfying P{X = x;} = p; > 0, for all i and 
Sear Di = 1, is called the probability mass function (pmf) of RV X. 


The DF F of X is given by 


FQ) =PRagS ype (2) 


xix 
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If Z, denotes the indicator function of the set A, we may write 
X(w) = So xilx=xj W)- (3) 
i=l 


Let us define a function ¢(x) as follows: 


L,.. 220; 
E(x) = 
. x<0. 
Then we have 
F(x) = S"pie(x— xi). (4) 
i=1 


Example 1. The simplest example is that of an RV X degenerate at c, P{X =c} =1: 


0, x<c, 
1 


5° BAC: 


F(x) =e(x-c) = 
Example 2. A box contains good and defective items. If an item drawn is good, we assign 
the number | to the drawing; otherwise, the number 0. Let p be the probability of drawing 
at random a good item. Then 


and 
0, x <0, 
F(x) =P{X<x}= 1—p, O0<x<1, 
1, 1<x. 
Example 3. Let X be an RV with PMF 
P{X =k} = : k= 1,2 
-_- ~~ 72 k2’ a ? 9 
Then, 
61 
F(x)= 3) pee—®) 


Theorem 1. Let {p;} be a collection of nonnegative real numbers such that 577°, px = 1. 
Then {p,} is the PMF of some RV X. 
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We next consider RVs associated with DFs that have no jump points. The DF of such 
an RV is continuous. We shall restrict our attention to a special subclass of such RVs. 


Definition 3. Let X be an RV defined on (Q,8,P) with DF F. Then X is said to be of 
the continuous type (or, simply, continuous) if F is absolutely continuous, that is, if there 
exists a nonnegative function f(x) such that for every real number x we have 


F(x) = ‘i f(t) dt. (5) 
The function f is called the probability density function (PDF) of the RV X. 


Note that f > 0 and satisfies lim,_, +. F(x) = F(+00) = f°. f(t)dt = 1. Let a and b 
be any two real numbers with a < b. Then 


Pla<X <b} =F(b)—F(a) 


= [roa 


In view of remarks following Definition 2.2.1, the following result holds. 


Theorem 2. Let X be an RV of the continuous type with PDF f. Then for every Borel set 
Bes 


P(B) = | Flt)at. (6) 
B 
If F is absolutely continuous and f is continuous at x, we have 
dF (x) 
Fx)= =F (x). 7 
(x) = = F(a) a) 


Theorem 3. Every nonnegative real function f that is integrable over & and satisfies 


[ feyar=t 


is the PDF of some continuous type RV X. 


Proof. In view of Theorem 2.3.5 it suffices to show that there corresponds a DF F to f. 
Define 


Fay= f(t) dt, xER. 
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Then F(—oo) = 0, F(+00) = 1, and, if x. > x), 


Fia)= (fo + | “\40 a> [" fod= Fin). 


Finally, F is (absolutely) continuous and hence continuous from the right. 


Remark 1. In the discrete case, P{X = a} is the probability that X takes the value a. In the 
continuous case, f(a) is not the probability that X takes the value a. Indeed, if X is of the 
continuous type, it assumes every value with probability 0. 


Theorem 4. Let X be any RV. Then 


P{X =a} =limP{t <X <a}. (8) 
t<a 


Proof. Lett) <t2<+++<a, t,— a, and write 
An = {th <X <a}. 


Then A, is a nonincreasing sequence of events which converges to (gee: = {X =a}. It 
follows that lim, +5. PA, = P{X =a}. 


Remark 2. Since P{t < X < a} = F(a) — F(t), it follows that 


limP{t <X <a} = P{X =a} = F(a) — lim F(t) 
= F(a)—F(a—). 


Thus F has a jump discontinuity at a if and only if P{X = a} > 0, that is, F is continuous 
at a if and only if P{X = a} =0. If X is an RV of the continuous type, P{X = a} = 0 for 
all a € ®. Moreover, 


P{XER— {as} =1. 
This justifies Remark 1.3.4. 


Remark 3. The set of real numbers x for which a DF F increases is called the support 
of F. Let X be the RV with DF F, and let S be the support of F. Then P(X € S) = 1 and 
P(X € S°) = 0. The set of positive integers is the support of the DF in Example 3, and the 
open interval (0,1) is the support of F in Example 4 below. 


Example 4. Let X be an RV with DF F given by (Fig. 1) 


0, x<0, 
F(x)= 4x, O<x<1, 
I, <x, 
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A 
F(x) 
1 
> 
0 0.5 1 1.5 
Fig. 1 
A 
x 
i fx) | 
> 
1 x 
Fig. 2 


Differentiating F with respect to x at continuity points of f, we get 


0, x<Oorx>1, 
1, O<x<l. 


(0) =F'6)=| 


The function f is not continuous at x = 0 or at x = | (Fig. 2). We may define f(0) and f(1) 
in any manner. Choosing f(0) = f(1) = 0, we have 


», O<x<l, 


P(x) = {0 otherwise. 
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Then 
P{0.4 <X < 0.6} = F(0.6) — F(0.4) = 0.2. 


Example 5. Let X have the triangular PDF (Fig. 3) 


F(x) =0 if x <0, 
x x2 
Fa)= | tdt = — if0<x<1, 
0 2 
1 x x2 
Fa)= | rar | (2—1)dt=2x—->—1 ifl<x<2, 
0 1 
and 
F(x) =1 ifx > 2. 
Then 


P{0.3 <X < 1.5} = P{X < 1.5}—P{X < 0.3} 
= 0.83. 


Sx) 


0 1 2 x 
Fig. 3 Graph of f. 
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F(x)a 


1 F(x) 


0 1 2 x 
Fig. 4 Graph of F. 


Example 6. Let k > 0 be a constant, and 


(x) kx(1—x), O<x<1, 
i 
0, otherwise. 


Then i f(x) dx = k/6. It follows that f(x) defines a PDF if k = 6. We have 


3 
P{X>0.3}=1- 6 | x(1 —x) dx = 0.784. 
0 


We conclude this discussion by emphasizing that the two types of RVs considered above 
form only a part of the class of all RVs. These two classes, however, contain practically all 
the random variables that arise in practice. We note without proof (see Chung [15, p. 9]) 
that every DF F can be decomposed into two parts according to 


F(x) = aFa(x) + (1—a)F;(x). (9) 


Here F, and F, are both DFs; Fz is the DF of a discrete RV, while F, is a continuous (not 
necessarily absolutely continuous) DF. In fact, F. can be further decomposed, but we will 
not go into that (see Chung [15, p.11]). 


Example 7. Let X be an RV with DF 


0, x <0, 
5, x=0, 
+4, O0<x<l, 
1, 1<x. 


F(x) = 
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Note that the DF F has a jump at x = 0 and F is continuous (in fact, absolutely continuous) 
in the interval (0,1). F is the DF of an RV X that is neither discrete nor continuous. We 


can write 
P(x) = 2Fa(x) + 5 Fe(2) 
x a) d\x 2 clX), 
where 
0, x<0, 
F _ ? ? 
a(x) {? x>0 
and 
0, x<0, 
F.(x)= 4x, O0<x<1, 
1, l<x. 


Here F(x) is the DF of the RV degenerate at x = 0, and F(x) is the DF with PDF 


», O<x<l, 


fel) = i otherwise. 


PROBLEMS 2.4 


1. Let 
pe=p(l—p)*, k=0,1,2,..., O<p<1. 


Does {p;} define the PMF of some RV? What is the DF of this RV? If X is an RV 
with PMF {p,}, what is P{n < X < N}, where n, N (N > n) are positive integers? 


2. In Problem 2.3.3, find the PDF associated with the DFs of parts (b), (c), and (d). 
3. Does the function fg(x) = 0?xe~* if x > 0, and = 0 if x < 0, where 6 > 0, define 


a PDF? Find the DF associated with fo(x); if X is an RV with PDF f(x), find 
P{X > I}. 

4. Does the function fg(x) = {(x+1)/[0(0 + 1)]}e-*/° if x > 0, and = 0 otherwise, 
where 6 > 0 define a PDF? Find the corresponding df. 

5. For what values of K do the following functions define the PMF of some RV? 
(a) f(x) = K(*/x!), x =0,1,2,...,A > 0. 
(b) f(x) = K/N,x=1,2,...,N. 

6. Show that the function 


is a PDF. Find its DF. 
7. For the PDF f(x) =xif0 <x < 1,and=2—.xif 1 <x <2, find P{1/6<X <7/4}. 
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8. Which of the following functions are density functions: 
(a) f(x) =x(2—x), 0<x < 2, and 0 elsewhere. 
(b) f(x) = at 2x—1), 0<x < 2, and 0 elsewhere. 
(c) f(x) = ¢exp{—(x—0)/A}, x > 0, and 0 elsewhere, \ > 0. 
(d) f(x) =sinx, 0<x< 7/2, and 0 elsewhere. 
(e) f(x) =0 for x < 0, = (x+1)/9 forO <x < 1, =2(2x—1)/9 for 1 <x < 3/2, 
= 2(5—2x)/9 for 3/2 <x < 1, =4/27 for 2 <x <5, and 0 elsewhere. 
() f(x) =1/[r(1+2*)], xe R. 
9. Are the following functions distribution functions? If so, find the corresponding 
density or probability functions. 


(a) F(x) =0 for x <0, =x/2 forO<x< 1, =1/2 forl1 <x < 2, =x/4 for 
2<x<4and=1forx>4. 
(b) F(x) =Oifx <—0,= 5 ($41) if |x| <4, and 1 for x > 6 where 6 > 0. 
(c) F(x) =0if x <0, and = 1 — (1+ x) exp(—x) if x > 0. 
(d) F(x) =Oifx<1,=(x—1)*/8 if 1 <x <3, and 1 forx>3. 
(ce) F(x) =Oifx<0,and=1—e~ ifx>0. 
10. Suppose P(X > x) is given for a random variable X (of the continuous type) for 


all x. How will you find the corresponding density function? In particular find the 
density function in each of the following cases: 


(a) P(X > x) =1ifx <0, and P(X > x) =e-™ for x > 0, \ > 0 is a constant. 
(b) P(X >x)=1ifx <0, and = (1+ x/A)~, for x > 0, A > 0 is a constant. 

(c) P(X > x) =1ifx <0, and =3/(1+x)?-—2/(1+x) ifx>0. 

(d) P(X > x) =1ifx<xo, and = (xo/x)° if x > x0; x9 > 0 and a > O are constants. 


2.5 FUNCTIONS OF A RANDOM VARIABLE 
Let X be an RV with a known distribution, and let g be a function defined on the real line. 


We seek the distribution of Y = g(X), provided that Y is also an RV. We first prove the 
following result. 


Theorem 1. Let X be an RV defined on (,5,P). Also, let g be a Borel-measurable 
function on ®. Then g(X) is also an RV. 


Proof. For y € ®, we have 
{g(X) Sy} = {X Eg" '(—00, y]}, 


and since g is Borel-measurable, g~!(—oo, y] is a Borel set. It follows that {g(X) < y} €8, 
and the proof is complete. 


Theorem 2. Given an RV X with a known DF, the distribution of the RV Y = g(X), where 
g is a Borel-measurable function, is determined. 


56 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 
Proof. Indeed, for all yc R 
P{Y <y} = P{X € g"'(—00,y]}. (1) 
In what follows, we will always assume that the functions under consideration are 
Borel-measurable. 


Example 1. Let X be an RV with DF F. Then |X|, aX + b (where a 4 0 and bare constants), 
X* (where k > 0 is an integer), and |X|* (a > 0) are all RVs. Define 


» ae X, x>0, 
0, X<O0, 


and 


Then X*, X~ are also RVs. We have 


P{|X| <y} =P{-y <X <y} =P{X <y}—P{X < -y} 
= F(y)—F(-y)+P{X=—y}, yy >0; 
P{aX+b<y} =P{aX <y—b} 


P{x< =| ifa>0, 
a 
—b 
p{x> =| ifa<0: 
a 
0 ify <0, 


Pix’ Sy =]4 Pix <0} if y=0, 
P{X <0} +P{0<X<y} ify>0. 


Similarly, 


renal" ify >0, 
OF) PIX <y} ify <0. 


Let X be an RV of the discrete type, and A be the countable set such that P{X € A} = | 
and P{X = x} > 0 for x € A. Let Y = g(X) be a one-to-one mapping from A onto some 
set B. Then the inverse map, g~!, is a single-valued function of y. To find P{Y = y}, we 
note that 


P{Y =y} =P{g(X) =y} =P{X=e27')}, yeB, 
and P{Y=y}=0, yeB". 
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Example 2. Let X be a Poisson RV with PMF 
k 
_\A 


kt ’ 
0, otherwise. 


e k=0,1,2,...;A>0, 


P{X =k} = 


Let Y = X* +3. Then y = x? +3 maps A = {0,1,2,...} onto B = {3,4,7, 12,19, 28,...}. 
The inverse map is x = \/(y—3), and since there are no negative values in A we take the 
positive square root of y— 3. We have 


e>\vy-3 


vou 


i a) al i al Pea 
and P{Y = y} = 0 elsewhere. 


Actually the restriction to a single-valued inverse on g is not necessary. If g has a 
finite (or even a countable) number of inverses for each y, from countable additivity of P 
we have 


P{Y =y} = P{g(X) =y} -° {Ux a, g(a) =» 


a 


=) PX =ag@=)). 


Example 3. Let X be an RV with PMF 


P(X=-2}= 5, P{K=-I}=2, P{X =0} = <, 
1 1 
P{X=1}=— d  P{x=2}=—_. 
iX=I}= 75, an {X=2}= 35 


Let Y = X?. Then 


A={-2,-1,0,1,2} and B={0,1,4}. 


We have 
5 y=0, 
P{Y=y}= t+e=a y=1, 
1, u_W _ 
s+ 39 = 39 Y=4 


The case in which X is an RV of the continuous type is not as simple. First we note that 
if X is a continuous type RV and g is some Borel-measurable function, Y = g(X) may not 
be an RV of the continuous type. 
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Example 4. Let X be an RV with uniform distribution on [—1, 1], that is, the PDF of X is 
f(x) = 1/2, -1 <x <1, and = 0 elsewhere. Let Y = Xt. Then, from Example 1, 


0, y <0, 
5 y=0, 

PLY <y}= 

vey s+5y, 1>y>0, 
il y>l. 


We see that the DF of Y has a jump at y = 0 and that Y is neither discrete nor continuous. 
Note that all we require is that P{X < 0} > 0 for X* to be of the mixed type. 


Example 4 shows that we need some conditions on g to ensure that g(X) is also an RV 
of the continuous type whenever X is continuous. This is the case when g is a continuous 
monotonic function. A sufficient condition is given in the following theorem. 


Theorem 3. Let X be an RV of the continuous type with PDF f. Let y = g(x) be differen- 
tiable for all x and either g’(x) > 0 for all x or g’(x) < 0 for all x. Then Y = g(X) is also 
an RV of the continuous type with PDF given by 


fle) 6 '0) 


0, otherwise, 


, Cx y< p, 


h(y) = (2) 


where a = min{g(—oo), g(+00)} and 6 = max{g(—o0o),g(+00)}. 


Proof. If g is differentiable for all x and g’(x) > 0 for all x, then g is continuous and strictly 
increasing, the limits a, exist (may be infinite), and the inverse function x = g~!(y) 
exists, is strictly increasing, and is differentiable. The DF of Y for a < y < ( is given by 


PLY <y} =P{X<g7'(y)}. 
The PDF of g is obtained on differentiation. We have 


hy) = SPW <3} 


=fle'O) 8710). 


Similarly, if g’ < 0, then g is strictly decreasing and we have 


PLY Sy} =P{X > g7'(y)} 
=1-—P{X<g™'(y)} | (X is acontinuous type RV) 


so that 
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Since g and g~! are both strictly decreasing, (d/dy) 
Note that 
d _, 1 
Bee eS ee 
dy g(x) /dx 


so that (2) may be rewritten as 


F(x) 


") = Tea) fae 


? 


x=g7'() 
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g'(y) is negative and (2) follows. 


a<y<f. (3) 


Remark 1. The key to computation of the induced distribution of Y = g(X) from the dis- 
tribution of X is (1). If the conditions of Theorem 3 are satisfied, we are able to identify 
the set {X € g~!(—oo,y]} as {X < g7'(y)} or {X > g7'(y)}, according to whether g is 
increasing or decreasing. In practice Theorem 3 is quite useful, but whenever the condi- 
tions are violated one should return to (1) to compute the induced distribution. This is the 
case, for example, in Examples 7 and 8 and Theorem 4 below. 


Remark 2. If the PDF f of X vanishes outside an interval [a,b] of finite length, we need 
only to assume that g is differentiable in (a,b) and either g’ (x) > 0 or g’(x) < 0 throughout 


the interval. Then we take 
a = min{g(a), g(b)} and 


in Theorem 3. 


Example 5. Let X have the density f(x) = 1,0 <x 
Then X = log Y, and we have 


8 = max{g(a),g(b)} 


< 1, and = 0 otherwise. Let Y = e*. 


1 
A(y) =|-|-1, 0< logy <1, 
y: 
that is, 
: l<y< 
wat ¥y é, 
Ay)=4y 
0, otherwise. 
If y = —2logx, then x = e~/? and 
oe y/2. —y/2 
Ay) = |e" 1 0<e’ <l, 
_ se /?, 0<y<um, 
~ 10, otherwise. 
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Example 6. Let X be a nonnegative RV of the continuous type with PDF f, and let a > 0. 
Let Y = X°. Then 


P{X<y/ ify>0 
pixe<ypad PX sy} ify20, 
0 ify <0. 
The PDF of Y is given by 
a d a 
h(y) =f0"/") | zy 
y 


1 
iro): y>0, 
0, y <0. 


Example 7. Let X be an RV with PDF 


1 2 
f(x) =e”, —00 <x < 00. 


V20 


Let Y = X”. In this case, g’(x) = 2x which is > 0 for x > 0, and < 0 for x < 0, so that the 
conditions of Theorem 3 are not satisfied. But for y > 0 


PLY Sy} =P{-VySX< vy} 
= F( Vy) —F(-v9), 


where F is the DF of X. Thus the PDF of Y is given by 


1 
5 thy) +f(-vy)}, y>9, 
h(y) = 4 2V¥ 
0, y<0. 
Thus 
: gu, < y 
h(y) = 4 v2ry 
0, y <0. 
Example 8. Let X be an RV with PDF 
2x 
fii= 7? O<x<7, 
0, otherwise. 


Let Y = sinX. In this case g’(x) = cosx > 0 for x in (0,7/2) and < 0 for x in (7/2,7), 
so that the conditions of Theorem 3 are not satisfied. To compute the PDF of Y we return 
to (1) and see that (Fig. 1) the DF of Y is given by 
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A 


Fig.l y=sinx,0<x<r7. 


P{Y<y}=P{sinX<y}, O<y<], 
= P{[(0<X<xJUbo <X<ziI}, 


where x; = sin”! y and x) = 7 —sin~' y. Thus 
XY] vg 
Piy<y}= [ pljar+ [flav 
0 x2 


-(2y1-@), 


and the PDF of Y is given by 


Pe | 2 
h(y) d (= *) nm d 


- dy T dy 
2 Hae 

= = 1 —y ’ Fe ? 
0, otherwise. 


In Examples 7 and 8 the function y = g(x) can be written as the sum of two mono- 
tone functions. We applied Theorem 3 to each of these monotonic summands. These two 
examples are special cases of the following result. 


Theorem 4. Let X be an RV of the continuous type with PDF f. Let y = g(x) be differen- 
tiable for all x, and assume that g’(x) is continuous and nonzero at all but a finite number 
of values of x. Then, for every real number y, 
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(a) there exist a positive integer n = n(y) and real numbers (inverses) x) (y),%2(y),..- 
Xn(y) such that 


glae(yJ=y,  g'[x(y)] 40, &=1,2,...,n(y), 


(b) there does not exist any x such that g(x) = y, g’(x) 40, in which case we write 
n(y) = 0. 


Then Y is a continuous RV with PDF given by 


_J Spo le’ bx(y)]|7! ifn >0, 
0 ifn=0. 


Example 9. Let X be an RV with PDF f, and let Y = |X|. Here n(y) = 2, xi(y) = y, 
X2(y) = —y for y > 0, and 


or ee 


Thus, if f(x) = 1/2, -1 <x < 1, and = 0 otherwise, then 


1, O<y<l, 
h — 
) 0, otherwise. 


If f(x) = (1/V2m)e~&/?), —00 < x < 00, then 


2 2 
= _e- 0/2) 

e , y>d, 
h(y) = 4 V20 


0, otherwise. 


Example 10. Let X be an RV of the continuous type with PDF f, and let Y = X7”, where 
mis a positive integer. In this case g(x) =x”, g’ (x) = 2mx?""! > 0 for x > Oand g’(x) <0 
for x <0. Writing n = 2m, we see that, for any y > 0, n(y) = 2,21 (y) =—y!/", 2(y) =y!/". 
It follows that 


1 
ny!—1/n 


I 


TEN a ae eO) 


prin lO +N} ity>0, 
0. ify <0. 
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In particular, if f is the PDF given in Example 7, then 


2 { rh 2 é 
ex ify >0, 

h(y) = V2rny!-1/n P .: Y 
0 ify <0. 


Remark 3. The basic formula (1) and the countable additivity of probability allow us to 
compute the distribution of Y = g(X) in some instances even if g has a countable number 
of inverses. Let A C ® and g map A into B C &. Suppose that A can be represented as a 
countable union of disjoint sets Ay, k = 1,2,.... Then the DF of Y is given by 


P{Y <y} =P{X € g | (—c0,y]} 


= Pix é Sle "-ceoi) oat 


k=1 


If the conditions of Theorem 3 are satisfied by the restriction of g to each Ay, we may 
obtain the PDF of Y on differentiating the DF of Y. We remind the reader that term-by-term 
differentiation is permissible if the differentiated series is uniformly convergent. 


Example 11. Let X be an RV with PDF 


be, x >0, 
x)= ; * @>0. 
fe) oo 


Let Y =sinX, and let sin7! y be the principal value. Then (Fig. 2), forO< y < 1, 


P{sinX < y} 
= P{0<X <sin | yor (2n—1)n—sin !y<X <2nm+sin!y 
for all integers n > 1} 


=P{0<xX< sin“'y} +> P{(2n- 1)n—sin7!y <X <2nr+sin7' y} 
n=1 


Sj =a ] _— —_ —si = vy _ in7! y 
_ 1—e sin Vole S le 6[(2n—1)x—sin ~~ y] 6 0(2n7+sin »)) 


n=1 


oo 
—_ Sj =1 ] + ‘ql tf _— igo y _— 
—l-e Osin™ yy (e* Osin™ y —e Osin ») e (207)n 


n=1 


1 —O@sin7!y Or+0sin~!y —Osin7!y eo 
IE ee 


a tert 
re e797 +6 sin Yee @sin” y 
ieee 1 — e270 
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Fig.2) y=sinx, x>0. 


A similar computation can be made for y < 0. It follows that the PDF of Y is given by 


egg (| _ gery —y2)“1/2fe8sin™'y + e~9m—Osin“'y] f—1< y< 0, 
h(y) -_ O(1 — e267) 1 (1 —y2)-1/2[e sin y 4 g— Ont Osin™'y] if0<y< i, 
0 otherwise. 
PROBLEMS 2.5 


1. Let X be a random variable with probability mass function 
Pixan= ("Jorn r=0,1,2,...,.n, O<p<l. 
r 


Find the PMFs of the RVs (a) Y = aX +b, (b) Y = X’, and (c) Y= VX. 
2. Let X be an RV with PDF 


0 ifx <0, 

1 

1 

sa ifl<x<oo. 


Find the PDF of the RV 1/X. 
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3. 


11. 


12. 


Let X be a positive RV of the continuous type with PDF f(-). Find the PDF of the 
RV U=X/(1+X). If, in particular, X has the PDF 


e={4 O<x<1, 


0, otherwise, 


what is the PDF of U? 


. Let X be an RV with PDF f defined by Example 11. Let Y= cosX and Z = tanX. 


Find the DFs and PDFs of Y and Z. 


. Let X be an RV with PDF 


be-% ifx>0, 
fo(x) = = 


0 otherwise, 


where 0 > 0. Let Y = [X — 1/0]. Find the PDF of Y. 


. A point is chosen at random on the circumference of a circle of radius r with center 


at the origin, that is, the polar angle 6 of the point chosen has the PDF 


Find the PDF of the abscissa of the point selected. 


6 € (-7,7). 


. For the RV X of Example 7 find the PDF of the following RVs: (a) Y; = e*, (b) Y2 = 


2X? + 1, and (c) ¥3 = g(X), where g(x) =1 ifx >0, = 1/2 if x =0, and = —1 if 
x <0. 


. Suppose that a projectile is fired at an angle 6 above the earth with a velocity V. 


Assuming that # is an RV with PDF 


12 T 7 
if 0 

f=)" "6S <a 
0 otherwise, 


find the PDF of the range R of the projectile, where R = V*sin20/g, g being the 
gravitational constant. 


. Let X be an RV with PDF f(x) = 1/(27) if 0 <x < 2m, and = 0 otherwise. Let 


Y = sinX. Find the DF and PDF of Y. 


. Let X bean RV with PDF f(x) = 1/3 if —1 <x <2, and = 0 otherwise. Let Y = |X]. 


Find the PDF of Y. 


Let X be an RV with PDF f(x) = 1/(20) if —0 < x < 0, and = 0 otherwise. Let 
Y = 1/X2. Find the PDF of Y. 


Let X be an RV of the continuous type, and let Y = g(X) be defined as follows: 
(a) g(x) = 1ifx>0, and =—1lifx<0. 

(b) g(x) =bifx>b, =x if |x| <b, and = —bifx< —b. 

(c) g(x) =xif |x| > b, and = 0 if |x| <b. 

Find the distribution of Y in each case. 


MOMENTS AND GENERATING 
FUNCTIONS 


3.1 INTRODUCTION 


The study of probability distributions of a random variable is essentially the study of some 
numerical characteristics associated with them. These so-called parameters of the distribu- 
tion play a key role in mathematical statistics. In Section 3.2 we introduce some of these 
parameters, namely, moments and order parameters, and investigate their properties. In 
Section 3.3 the idea of generating functions is introduced. In particular, we study prob- 
ability generating functions, moment generating functions, and characteristic functions. 
Section 3.4 deals with some moment inequalities. 


3.2) MOMENTS OF A DISTRIBUTION FUNCTION 


In this section we investigate some numerical characteristics, called parameters, associ- 
ated with the distribution of an RV X. These parameters are (a) moments and their functions 
and (b) order parameters. We will concentrate mainly on moments and their properties. 

Let X be a random variable of the discrete type with probability mass function 
Pr=P{X =x}, k= Li Qeeecs If 


So lxxlPe < 00, (1) 
k=1 
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we Say that the expected value (or the mean or the mathematical expectation) of X exists 
and write 


= EX =S_ xp. (2) 
k=1 


Note that the series }77° , x.px may converge but the series }77~, |x¢|px may not. In that 
case we say that EX does not exist. 


Example 1. Let X have the PMF given by 


13! 2 
y= P{K=( yeh, wee 


Then 
> bale: => = = 00, 
j=l jad 


and EX does not exist, although the series 


is convergent. 


If X is of the continuous type and has PDF f, we say that EX exists and equals [ xf (x) dx, 
provided that 


[rire dx <0. 


A similar definition is given for the mean of any Borel-measurable function h(X) of X. 
Thus, if X is of the continuous type and has PDF f, we say that Eh(X) exists and equals 
J h(x)f(x) dx, provided that 


[incor dx < ov. 


We emphasize that the condition f |x| f(x) dx < co must be checked before it can be 
concluded that EX exists and equals f[ xf(x)dx. Moreover, it is worthwhile to recall at 
this point that the integral f°. p(x) dx exists, provided that the limit limj—}°S [“,, p(x) dx 
exists. It is quite possible for the limit limg—o0 i (p(x) dx to exist without the existence 
of aes (p(x) dx. As an example, consider the Cauchy PDF: 


11 
mw il+x’ 


F(x) 


o<cx< om. 
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Clearly 


However, EX does not exist since the integral (1/m) [°° |x|/(1 +2) dx diverges. 
Remark 1, Let X(w) = I,(w) for some A € 8. Then EX = P(A). 
Remark 2. If we write h(X) = |X|, we see that EX exists if and only if E|X| does. 
Remark 3. We say that an RV X is symmetric about a point a if 

P{X > a+x} = P{X <a—x} for all x. 
In terms of DF F of X, this means that, if 


F(a—x) =1—F(a+x)+P{X =a+x} 


holds for all x € ®, we say that the DF F (or the RV X) is symmetric with a as the center 
of symmetry. If a = 0, then for every x 


F(—x) = 1-—F(x)+P{X =x}. 


In particular, if X is an RV of the continuous type, X is symmetric with center a if and 
only if the PDF f of X satisfies 


f(a—x) =f(a+x) for all x. 


If a = 0, we will say simply that X is symmetric (or that F is symmetric). 

As an immediate consequence of this definition we see that, if X is symmetric with a 
as the center of symmetry and E|X| < 00, then EX = a. A simple example of a symmetric 
distribution is the Cauchy PDF considered above (before Remark 1). We will encounter 
many such distributions later. 


Remark 4. If a and b are constants and X is an RV with E|X| < oo, then E|aX + b| < co 
and E{aX + b} = aEX + b. In particular, E{X — 4} = 0, a fact that should not come as a 
surprise. 


Remark 5. If X is bounded, that is, if P{|X] <M} =1,0<M<.«, then EX exists. 
Remark 6. If {X > 0} = 1, and EX exists, then EX > 0. 


Theorem 1. Let X be an RV, and g be a Borel-measurable function on ®. Let Y = g(X). 
If X is of discrete type then 


EY = g(x) P{X =} (3) 


j=l 
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in the sense that, if either side of (3) exists, so does the other, and then the two are equal. 
If X is of continuous type with PDF f then EY = [ g(x)f(x)dx in the sense that, if either 


of the two integrals converges absolutely, so does the other, and the two are equal. 


Remark 7. Let X be a discrete RV. Then Theorem | says that 


j=l 


S 5 9(x))P{X = xj} = So yeP{Y = yu} 
k=1 


in the sense that, if either of the two series converges absolutely, so does the other, and 
the two sums are equal. If X is of the continuous type with PDF f, let /(y) be the PDF of 
Y = g(X). Then, according to Theorem 1, 


J stareyar= [ ynoyay 
provided that E|g(X)| < co. 


Proof of Theorem 1. In the discrete case, suppose that P{X € A} = 1. If y = g(x) isa 
one-to-one mapping of A onto some set B, then 


P{Y=y}=P{X=g"0)}, yes. 
We have 


Sa(x)P{X =x} = > yP{Y = y}. 


xEA yEB 


In the continuous case, suppose g satisfies the conditions of Theorem 2.5.3. Then 


B 
i e(a)f(x) de = ‘i sfle Oe 'O)ldy 


by changing the variable to y = g(x). Thus 


[serear= [ eo 


The functions h(x) =x", where 7 is a positive integer, and h(x) = |x|%, where a is a pos- 
itive real number, are of special importance. If EX” exists for some positive integer n, we 
call EX” the nth moment of (the distribution function of) X about the origin. If E|X|* < co 
for some positive real number a, we call E|X|° the ath absolute moment of X. We shall 
use the following notation: 


My, = EX" — Ba, = E|X|“, (4) 


whenever the expectations exist. 
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Example 2. Let X have the uniform distribution on the first NV natural numbers, that is, let 


i 
P{X=k}=— k=1,2,...,N 
{ } N’ F ? ? 
Clearly, moments of all order exist: 
N 
1 N+1 
EX = k-— = —_, 
DLS ee 
k=1 
N 
1 1)(2 1 
EX?=Sk- = Wr )(2N + ) 
N 6 
k=1 
Example 3. Let X be an RV with PDF 
2 
= >1 
fx)=ye? OO? 
O, ss 
Then 
1 
But 


P01) 
mx = [ = dx 
1 2.9 


does not exist. Indeed, it is easily possible to construct examples of random variables for 
which all moments of a specified order exist by no higher-order moments do. 


Example 4. Two players, A and B, play a coin-tossing game. A gives B one dollar if a 
head turns up; otherwise B pays A one dollar. If the probability that the coin shows a head 
is p, find the expected gain of A. 

Let X denote the gain of A. Then 


P{X = 1} = P{Tails} = 1—p, P{X =—-l}=p 
and 


>0 ifand only if p< 4, 
EX =1—p—p=1-2p hie ee i 
=0 if and only ifp= 5, 


Thus EX = 0 if and only if the coin is fair. 
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Theorem 2. Ifthe moment of order t¢ exists for an RV X, moments of order 0 < 5 < f exist. 


Proof. Let X be of the continuous type with PDF f. We have 


axis [  biveac f IxI$F(x) de 


|x[5>1 
< P{|X|* < 1} +E|X|' < cw. 
A similar proof can be given when X is a discrete RV. 


Theorem 3. Let X be an RV on a probability space (0,8, P). Let E|X|* < oo for some 
k > 0. Then 


n*‘P{|X| >n}—>0 as n —> oo. 


Proof. We provide the proof for the case in which X is of the continuous type with 
density f. We have 


CO 


ie > | wttreac= tim, [ |x|*F (x) dx. 
nm? |x|<n 


It follows that 


n—->co 


lim / |x|*f (x) dx > 0 as Nn —> 00. 
|x| >n 


But 
i bxlAf(x) dx > a PL [X| > n}, 
|x| >n 


completing the proof. 


Remark 8. Probabilities of the type P{|X| > n} or either of its components, P{X > n} or 
P{X < —n}, are called tail probabilities. The result of Theorem 3, therefore, gives the rate 
at which P{|X| > n} converges to 0 as n —> oo. 


Remark 9. The converse of Theorem 3 does not hold in general, that is, 
n‘P{|X| >n} 0 as n —> oo for some k 


does not necessarily imply that E|X|* < oo, for the RV 


P{X =n} = c N= 2352054 


n logn’ 
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where c is a constant determined from 


c 
2 nlogn ~ 


We have 


| 
P{X>n} ef Pr ails cn! (logn)7! 


and nP{X > n} — 0 as n > oo. (Here and subsequently ~ means that the ratio of two 
sides + 1 as n — co.) But 


Cc 
EX = —— = 
Ds nlogn 
In fact, we need 
n+>PL\X|>n}>30 asn—>0 


for some 6 > 0 to ensure that E|X|* < 00. A condition such as this is called a moment 
condition. 


For the proof we need the following lemma. 


Lemma 1. Let X be a nonnegative RV with distribution function F. Then 


EX = fo — F(x)] dx, (5) 


0 


in the sense that, if either side exists, so does the other and the two are equal. 


Proof. If X is of the continuous type with density f and EX < oo, then 


EX = [ xf(x)dx = lim a | "sf Gade 


On integration by parts we obtain 


[rove )- [Fo) v)dx 


=n[l — F(n)] + a [1 — F(x]. 
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But 


and, since E|X| < 00, it follows that 
n{l — F(n)] +0 as n —> 00. 


We have 


W 


EX = lim "ef (8) dx = lim | [l—F(x)|dx 
0 


n—-oo n—-co 0 


= [treo 


If {5° [1 — F(x)] dx < 00, then 


| “af (de < / "l= F(@)]de < | “UL -FQ)]dr, 


and it follows that E|X| < oo. 
We leave the reader to complete the proof in the discrete case. 


Corollary. For any RV X, E|X| < oo if and only if the integrals i. P{X < x}dx and 
Jo. P{X > x} dx both converge, and in that case 


lee) 0 
EX = iy P{X > xjar— f P{X < x} dx. 
0 —oo 
Actually we can get a little more out of Lemma | than the above corollary. In fact, 
E|x|* = P{|X|* > x} dx = of xO PX | eal de, 
0 0 


and we see that an RV X possesses an absolute moment of order a > 0 if and only if 
|x|°~' P{|X| > x} is integrable over (0,00). 
A simple application of the integral test leads to the following moments lemma. 
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Lemma 2. 


E|X|* <00 & }" P{|X| > n!/*} <0. (6) 


n=1 


Note that an immediate consequence of Lemma 2 is Theorem 3. We are now ready to 
prove the following result. 


Theorem 4. Let X be an RV with a distribution satisfying n®P{|X| >n} > 0 as n— oo 
for some a > 0. Then E|X|° < co for0< B<a. 


Proof. Given ¢ > 0, we can choose an N = N(e) such that 
P{|X|>n} <= for alln > N. 
n& 


It follows that for0< B<a 


N oe) 
ex? = f x? PL|x| >s}ar+e [ x? P{|X| > x} dx 


co A 
< NP +Be [ gO de 
N 

< Ow. 


Remark 10. Using Theorems 3 and 4, we demonstrate the existence of random variables 
for which moments of any order do not exist, that is, for which E|X|* = co for every a > 0. 
For such an RV n*P{|X| > 1} + 0 as n > oc for any a > 0. Consider, for example, the 
RV X with PDF 


1 
—— Of > 
fo) =) Faldogiy? TI 
0 otherwise. 
The DF of X is given by 
1 
So ifx<— 
2log |x| ie 
1 
F(x) = 7 if —-e<x<e, 
1 
may ifx>e. 
ogx 


Then for x > e 


P{|X| > x} = 1— F(x) +F(-x) 
1 
= ieee 
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and x°P{|X| > x} — oo as x > oo for any a > 0. It follows that E|X|° = 00 for every 
a > 0. In this example we see that P{|X| > cx}/P{|X| > x} — 1 as x > oo for every 
c > 0. A positive function L(-) defined on (0,00) is said to be a function of slow variation 
if and only if L(cx)/L(x) > 1 as x > oo for every c > 0. For such a function x° L(x) + 00 
for every a > 0 (see Feller [26, pp. 275—279]). It follows that, if P{|X| > x} is slowly 
varying, E|X|* = oo for every a > 0. Functions of slow variation play an important role 
in the theory of probability. 


Random variables for which P{|X| > x} is slowly varying are clearly excluded from 
the domain of the following result. 


Theorem 5. Let X be an RV satisfying 


P{|X| > cx} 


PUX)>a} °° as X — OO for allc > 1; (7) 


then X possesses moments of all orders. (Note that, if c = 1, the limit in (7) is 1, whereas 
if c < 1, the limit will not go to 0 since P{|X| > cx} > P{|X| > x}.) 


Proof. Let € > 0 (we will choose ¢ later), choose xo so large that 


P{|X| > cx} 
—  <_E for all x > 
P{X|>x} < or all x > x0, (8) 


and choose x; so large that 
P{|X|>x}<e for all x > x1. (9) 


Let N = max(xo,x1). We have, for a fixed positive integer r, 


r 


r P 
ee il P{|X| > c?x} ez (10) 
P{|X| > x} = P{|X| > cP—!x} 
for x > N. Thus for x > N we have, in view of (9), 
PIK| >see: (11) 


Next note that, for any fixed positive integer n, 
Co 
E|X|" = nf Pl x |S xh de 
0 


N co 
=n f e'P(X| > a}de-tn | x" PL|X| > x} dx. (12) 
0 N 
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Since the first integral in (12) is finite, we need only show that the second integral is also 
finite. We have 


fos) co cN 
| x PLL] >x}ax= > x"! PLIX| > x}dx 
N r=17¢ r—1N 


ar 


< ~ (c’N)"—"e" -2c'N 
r=1 


foe) 
=2N" S lec") 
r=1 
arr 
i = 1 


provided that we choose ¢ such that ec” < 1. It follows that E|X|" < co forn = 1,2,.... 
Actually we have shown that (7) implies E|X|° < 00 for all 6 > 0. 


Theorem 6. If /1,/2,...,/4, are Borel-measurable functions of an RV X and Eh;(X) exists 
for i=1,2,...,n, then E{}~7_, h;(X)} exists and equals >; , Eh;(X). 


Definition 1. Let k be a positive integer and c be a constant. If E(X —c)* exists, we call 
it the moment of order k about the point c. If we take c = EX = ju, which exists since 


E|X| < 00, we call E(X — 1)‘ the central moment of order k or the moment of order k 
about the mean. We shall write 


px = E{X — p}*. 


If we know mj ,72,...,mx, we can compute [11, /l2,..., 4x, and conversely. We have 


: k k 
Lk = Ex= pl =m— (1) wm + (5) wma — ee + (1h (13) 
and 


k k ' 
mg = E{X — w+ py = pet ({) ao 7 ( Patty (14) 


2 


The case k = 2 is of special importance. 


Definition 2. If EX exists, we call E{X — py}? the variance of X, and we write 0? = 
var(X) = E{X — 1)”. The quantity o is called the standard deviation (SD) of X. 


From Theorem 6 we see that 
o? = pip = EX’ — (EX). (15) 


Variance has some important properties. 


78 MOMENTS AND GENERATING FUNCTIONS 


Theorem 7. Var(X) = 0 if and only if X is degenerate. 
Theorem 8. Var(X) < E(X—c)? for any c 4 EX. 
Proof. We have 
var(X) = E{X— p}* = E{X—c}*+(c—p). 
Note that 
var(aX +b) = a’ var(X). 


Let E|X|? < 00. Then we define 


var(X) o 
and see that EZ = 0 and var(Z) = 1. We call Z a standardized RV. 


Example 5. Let X be an RV with binomial PMF 


Pix=k}= (fA -pyr k=0,1,2,....n; O<p<l. 


Then 
ae n 
EX = k kiq n—k 
5 (jon P) 
k=0 
n—1 _ n— 
=np> oe “=p 


= mp; 
EX? = E{X(X—1)+X} 


= THe (Poh =p) +p 


=n(n—1)p? +np; 
var(X) =n(n—1)p? +np—n’p? 
= np(1—p); 
EX? = E{X(X —1)(X—2)+3X(X—1)+X} 
=n(n—1)(n—2)p* +3n(n—1)p? +np; 


U3 = m3 — 3pm2 +27 


=n(n—1)(n—2)p> + 3n(n— 1)p? + np — 3np|n(n — 1)p? + np] + 2n3p? 


= np(1—p)(1—2p). 


(16) 
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A 


0 ap (x) 1 
Fig. 1  Quantile of order p. 

In the above example we computed factorial moments EX(X — 1)(X —2)---(X—k+1) 

for various values of k. For some discrete integer-valued RVs whose PMF contains facto- 

rials or binomial coefficients it may be more convenient to compute factorial moments. 


We have seen that for some distributions even the mean does not exist. We next consider 
some parameters, called order parameters, which always exist. 


Definition 3. A number x (Fig. 1) satisfying 
PIXSa Sp, PixXSx}2l—p, V<p<, (17) 


is called a quantile of order p [or (100p)th percentile] for the RV X (or, for the DF F of X). 
We write 3,(X) for a quantile of order p for the RV X. 


If x is a quantile of order p for an RV X with DF F, then 
pS F(x) <p+P{X =3}. (18) 


If P{X = x} =0, as is the case—in particular, if X is of the continuous type—a quantile 
of order p is a solution of the equation 


F(x) =p. (19) 
If F is strictly increasing, (19) has a unique solution. Otherwise (Fig. 2) there may be 
many (even uncountably many) solutions of (19), each of which is then called a quantile 


of order p. Quantiles are of great deal of interest in testing of hypotheses. 


Definition 4. Let X be an RV with DF F. A number x satisfying 


<F(x)< ; +P{X =x} (20) 
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F(x) 


F(x) 


P fevvene 


> 
0 1 x 


Fig. 2 (a) Unique quantile and (b) infinitely many solutions of F(x) = p. 
or, equivalently, 


P{X <x}> (21) 


Nile 


and P{X >x}> 


Nile 


is called a median of X (or F). 


Again we note that there may be many values that satisfy (20) or (21). Thus a median 
is not necessarily unique. 

If F is a symmetric DF, the center of symmetry is clearly the median of the DF F. 
The median is an important centering constant especially in cases where the mean of the 
distribution does not exist. 


Example 6. Let X be an RV with Cauchy PDF 


11 
op L432? 


f(x) 


—0O <_ ct <= 00. 
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Then E|X| is not finite but E|X|° < co for 0 < 6 < 1. The median of the RV X is 
clearly x = 0. 


Example 7. Let X be an RV with PMF 


1 1 1 
P{X = —2} = P{X = 0} ? P{X =1} = rt P{X =2} = é 

Then 
P{X <0 a d P{x>0 = : 


In fact, if x is any number such that 0 < x < 1, then 


P(X <x} = P(X =—2} + P(X =0} = 5 


and 


P{X >x} =P{X=1}+P{X=2} = ' 


and it follows that every x, 0 <x < 1, is a median of the RV X. 
If p = 0.2, the quantile of order p is x = —2, since 


1 
PIXS—2}=7>p and P{X >-2}=1>1-p. 


PROBLEMS 3.2 


1. Find the expected number of throws of a fair die until a 6 is obtained. 


2. From a box containing N identical tickets numbered | through N, n tickets are 
drawn with replacement. Let X be the largest number drawn. Find EX. 


3. Let X be an RV with PDF 


Cc 


fO= Gye 


—o<x<oo, m>1 


d 


where c = I'\(m) /[T'(1/2)l'(m—1/2)]. Show that EX?" exists if and only if 2r < 
2m— 1. What is EX?’ if 2r <2m—1? 


4. Let X be an RV with PDF 
kak 


f(x) = 4 @+a) 
0 otherwise (a > 0). 


ifx>0, 


Show that E|X|° < co for a < k. Find the quantile of order p for the RV X. 
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5. 


6. 


10. 


11. 
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Let X be an RV such that E|X| < 00. Show that E|X —c| is minimized if we choose 
c equal to the median of the distribution of X. 


Pareto’s distribution with parameters a and ( (both a and 8 positive) is defined 
by the PDF 


Bob. 
f(x) = ¢ x84! ee et 
0) ifx<a. 


Show that the moment of order n exists if and only if n < 6. Let 6 > 2. Find the 
mean and the variance of the distribution. 


. For an RV X with PDF 
5x ifO0<x<1, 
f(x) = 5 ifl<x<2, 
5(3—x) if2<x<3, 


show that moments of all order exist. Find the mean and the variance of X. 


. For the PMF of Example 5 show that 


EX? = np+7n(n— 1)p* + 6n(n—1)(n—2)p? +n(n— 1)(n—2)(n—3)p* 
and 
p14 = 3(npq)” + npq(1 — 6pq), 


where 0 <p<l1,qg=1-—p. 


. For the Poisson RV X with PMF 


Ne 
P{X =x} =e —, = 05152 58; 
x! 
show that EX = 4, EX? = \+*, EX? = 443)? +43, EX* =44+717 +643 +41, 
and pg = ps = A, pg = A432. 
For any RV X with E|X|* < co define 


ae M3 = ya 
(u2)3/? 13 


Here a3 is known as the coefficient of skewness and is sometimes used as a measure 
of asymmetry, and a4 is known as kurtosis and is used to measure the peakedness 
(“flatness of the top”) of a distribution. 

Compute a3 and a, for the PMFs of Problems 8 and 9. 


For a positive RV X define the negative moment of order n by EX~", where n > 0 
is an integer. Find E{1/(X + 1)} for the PMFs of Example 5 and Problem 9. 
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12. Prove Theorem 6. 
13. Prove Theorem 7. 


14. In each of the following cases, compute EX, var(X), and EX” (for n > 0, an integer) 
whenever they exist. 


(a) f(x) =1, -1/2 <x < 1/2, and 0 elsewhere. 


(b) f(x) =e *, x > 0, and 0 elsewhere. 

(c) f(x) = (k—1)/x*, x > 1, and 0 elsewhere; k > 1 is a constant. 
(d) f(x) = 1/[m(1+3")], -00 <x < 00. 

(e) f(x) = 6x(1—x),0 <x <1, and 0 elsewhere. 

(f) f(x) = xe7*, x > 0, and 0 elsewhere. 


(g) P(X =x) =p(1—p)*—!,x =1,2,..., and 0 elsewhere: 0 < p < 1. 
15. Find the quantile of order p(0 < p < 1) for the following distributions. 

(a) f(x) =1/x?, x > 1, and 0 elsewhere. 

(b) f(x) = 2xexp(—x’), x > 0, and 0 otherwise. 

(c) f(x) = 1/0,0<x < 0, and 0 elsewhere. 

(d) P(X =x) =0(1—6)"!, x =1,2,..., and 0 otherwise; 0< 6 <1. 

(e) f(x) = (1/87)x exp(—x/), x > 0, and 0 otherwise; 8 > 0. 

(f) f(x) = (3/b?)(b—x)?, 0 <x <b, and 0 elsewhere. 


3.3. GENERATING FUNCTIONS 
In this section we consider some functions that generate probabilities or moments of an 
RV. The simplest type of generating function in probability theory is the one associated 
with integer-valued RVs. Let X be an RV, and let 

pr = P{X =k}, k= 0, 1,2;..: 


with} pre = 1. 
Definition 1. The function defined by 
Pis)= Yop (1) 
k=0 


which surely converges for |s| < 1, is called the probability generating function (PGF) 
of X. 


Example 1. Consider the Poisson RV with PMF 


k 


P{X=k}= exe 


aw &=0,1,2,.... 
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We have 


ce —Xr 
P(s) = So(sd)f— ee =e), for alls. 


Example 2. Let X be an RV with geometric distribution, that is, let 
P{X=k}=pq', k=0,1,2,...;5 O<p<1, g=1-p. 


Then 


co 
)=Sos'pd' =P——, Is] <1. 
k=0 *@ 


Remark 1. Since P(1) = 1, series (1) is uniformly and absolutely convergent in |s| < 1 
and the PGF P is a continuous function of s. It determines the PGF uniquely, since P(s) 
can be represented in a unique manner as a power series. 


Remark 2. Since a power series with radius of convergence r can be differentiated 
termwise any number of times in (—r,r), it follows that 


P®(s) = = n(n—1)---(n—k+1)P(X =n)s nak 
n=k 


where P““) is the kth derivative of P. The series converges at least for —1 <s <1.Fors=1 
the right side reduces formally to E{X(X — 1)---(X —k+1)} which is the kth factorial 
moment of X whenever it exists. In particular, if EX < oo then P’(1) = EX, and if EX” < co 
then P (1) = EX(X—1) and Var(X) = EX? — (EX)? =P (1)—[P’(1)]?+ P’(1). 
Example 3. In Example | we found that P(s) = e~*"'~*), |s| < 1, for a Poisson RV. Thus 
P'(s)= Ke OD, 
P"(s) = Ne rACl-5), 
Also, EX = 4, E{X? — X} = 4’, so that var(X) = EX? — (EX)? =*+\A-N=dX. 


In Example 2 we computed P(s) = p/(1— sq), so that 


2 
Pq ” 2pq 

P'(s) = ——— d= P"(s)= ———. 

“=q-ge ™ = Ga sgp 

Thus 
2 2 2 
EX=4, gx? =44 77 var(xy= 542-4. 
Pp Pp Pp pp p 
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Expanding the right side into a power series we get 


Ped= Dae t= a 


and it follows that 


P(X =k) =p, = k=0,1,...,n. 


Qn ? 


We note that the PGF, being defined only for discrete integer-valued RVS, has limited 
utility. We next consider a generating function which is quite useful in probability and 
statistics. 


Definition 2. Let X be an RV defined on (0,5, P). The function 
M(s) = Ee™ (2) 


is known as the moment generating function (MGF) of the RV X if the expectation on the 
right side of (2) exists in some neighborhood of the origin. 


Example 5. Let X have the PMF 


6 1 
ay Ke Dg has 
T k : ea 9 


0, otherwise. 


Then (1/7?) °°, e*/k’, is infinite for every s > 0. We see that the MGF of X does not 
exist. In fact, EX = oo. 


Example 6. Let X have the PDF 


f (x) = ‘i x> 0, 


0, otherwise. 


Then 


1 f° : 
M(s) = sf eS—1/2)x dy 


Example 7. Let X have the PMF 


> 
—x — 
Pea Ge Pee ts 


0, otherwise. 
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Then 


M(s) = EeX = e~* aa, 
k=0 : 


=e *(1-e’) for alls. 


The following result will be quite useful subsequently. 


Theorem 1. The MGF uniquely determines a DF and, conversely, if the MGF exists, it is 
unique. 


For the proof we refer the reader to Widder [117, p. 460], or Curtiss [19]. Theorem 2 
explains why we call M(s) an MGF. 


Theorem 2. If the MGF M(s) of an RV X exists for s in (—so,59) say, so > 0, the 
derivatives of all order exist at s = 0 and can be evaluated under the integral sign, that is, 
M“®) ( )| 


= k a i 
s<o = EX" for positive integral k. (3) 


For the proof of Theorem 2 we refer to Widder [117, pp. 446-447]. See also Problem 9. 


Remark 3. Alternatively, if the MGF M(s) exists for s in (—so,59) say so > 0, one can 
express M(s) (uniquely) in a Maclaurin series expansion: 


, M'(0) | M"(0) bee, (4) 


a a ar 2 


so that EX* is the coefficient of s‘ /k! in expansion (4). 


Example 8. Let X be an RV with PDF f(x) = (1/2)e~*/?, x > 0. From Example 6, M(s) = 
1/(1—2s) for s < 1/2. Thus 
2 re} 


Ud VW 1 
oO lair para and MS) = Goa <— 


It follows that 
EX=2,  EX’=8, and var(X)=4. 
Example 9. Let X be an RV with PDF f(x) = 1,0 <x < 1, and = 0 otherwise. Then 


rl 
. eae! 
m(s)= | “dx = © ; all s, 
0 7 
$.9—(e8—1)+1 
Mis) = 2 SET 


se’—e +] _ 1 


EX = M’ = lim —————_ = 
a 7 


S 


GENERATING FUNCTIONS 87 
We emphasize that the expectation Ee™ does not exist unless s is carefully restricted. 
In fact, the requirement that M(s) exists in a neighborhood of zero is a very strong require- 


ment that is not satisfied by some common distributions. We next consider a generating 
function which exists for all distributions. 


Definition 3. Let X be an RV. The complex-valued function ¢ defined on ® by 

b(t) = E(e™) = E(costX)+iE(sintX), teR 
where i = ,/(—1) is the imaginary unit, is called the characteristic function (CF) 
of RV X. 


Clearly 


o(t) = S"(costk + isin tk) P(X =) 
k 
in the discrete case, and 


g(t) = [. cosmp nid +i f sintx f(x) dx 


—cCo —oco 


in the continuous case. 


Example 10. Let X be a normal RV with PDF 


fa) = (=) ew (=). xER. 


Then 


1 = 2 i = 2 
_ —x°/2 ‘ —x /2 
t)= | —— cos tx e dx + : sin tx e dx. 
oy (se) (ss) —oo 


Note that sin tx is an odd function and so also is sintx e~* /*. Thus the second integral on 
the right-side vanishes and we have 


1 oe 2 

= ; —x/2 

t)= | — costx e dx 
#4) Ce)f. 

2 foe) F 

_ —x°/2 _ 4-t /2 

= { — cos tx e dx =e , tER. 

(z=) f 


Remark 4. Unlike an MGF which may not exist for some distributions, a CF always exists 
which makes it a much more convenient tool. In fact, it is easy to see that @ is continuous 
on &, |@(t)| < 1 for all t, and ¢(—t) = g(t) where ¢ is the complex-conjugate of ¢. Thus 


@ is the CF of —X. Moreover, ¢ uniquely determines the DF of RV X. For these and 


88 MOMENTS AND GENERATING FUNCTIONS 


many other properties of characteristic functions we need a comprehensive knowledge 
of complex variable theory, well beyond the scope of this book. We refer the reader to 
Lukacs [69]. 


Finally, we consider the problem of characterizing a distribution from its moments. 
Given a set of constants {jo = 1, 41, W2,...} the problem of moments asks if they can be 
moments of a distribution function F. At this point it will be worthwhile to take note of 
some facts. 

First, we have seen that if the M(s) = Ee™ exists for some X for s in some neighborhood 
of zero, then E|X|" < oo for all n > 1. Suppose, however, that E|X|” < oo for alln > 1. It 
does not follow that the MGF of X exists. 


Example 11. Let X be an RV with PDF 
f(x) =ceF""" 0<a<l, -w<x<o, 


where c is a constant determined from 


Let s > 0. Then 


co co i 
7 eve dx= - ee) dx 
0 0 


and since a— | < 0, fa s*e—” dx is not finite for any s > 0. Hence the MGF does not 
exist. But 


E|X|" = cf |x\"e— FI" de = 20 f ate® dx < 00 for each n, 
—0o 0 
as is easily checked by substituting y = x®. 
Second, two (or more) RVs may have the same set of moments. 
Example 12. Let X have lognormal PDF 
FOS GV er, x0, 


and f(x) = 0 for x < 0. Let Xe, 


é| < 1, have PDF 


f(x) =f(x)[1+esin(2mlogx)], xER. 
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(Note that f. > 0 for all e, 


- I AB ieing 
| x*f (x) sin(2m log.x)dx = (=) yee e7 P/2)+K gin (Dnt) dt 


1 ee —y?/2 os 
= | —]e e > /*sin(27y)d 
( =) - (2my)dy 
= 0, 


é| <1, and Ce (x)dx = 1, so f- is a PDF) Since, however, 


we see that 


| oa [ ” Efe (x) dx 


for all ¢, |e] < 1, andk =0,1,2,.... But f(x) A f(x). 

Third, moments of any RV X necessarily satisfy certain conditions. For example, if 
B, = E|X|”, we will see (Theorem 3.4.3) that (8,)!/” is an increasing function of v. 
Similarly, the quadratic form 


n 2 
E (x2) >0 


i=l 


yields a relation between moments of various orders of X. 
The following result, which we will not prove here, gives a sufficient condition for 
unique determination of F from its moments. 


Theorem 3. Let {7} be the moment sequence of an RV X. If the series 


3 as (5) 
converges absolutely for some s > 0, then {7m,} uniquely determines the DF F of X. 
Example 13. Suppose X has PDF 
f(x)=e", forx>0, and =0forx <0. 


Then EX* = i, x*e—*dx = k! and from Theorem 3 
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for 0 <s <1 so that {m,} determines F uniquely. In fact, from Remark 3 


0 < s < 1, which is the MGF of X. 
In particular if for some constant c 
|my| < ck, KS 152 0325 


then 


and the DF of X is uniquely determined. Thus if P{|X| < c} = 1 for some c > 0, then all 
moments of X exist, satisfying || < c*, k > 1, and the DF of X is uniquely determined 
from its moments. 

Finally, we mention some sufficient conditions for a moment sequence to determine a 
unique DF: 


(i) The range of the RV is finite. 
(ii) (Carleman) 377°, (m2) ~!/2* = 00 when the range of the RV is (—00, 00). If the 
range is (0,00), a sufficient condition is )~7° , (m,)~ 1/2 
(iii) Limp sco { (man)!/2" /2n} is finite. 


=O. 


PROBLEMS 3.3 


1. Find the PGF of the RVs with the following PMFs: 
(a) P(X =k} = (o\p*—p)y* k= 01,2... 0S p <1, 
(b) P{X =k} = [e~4/(1—e7*)](A*/k!), K=1,2,...5A>0. 
@ Pixea=h= pe l—-f™)-'" £=0,1,2,..., 40 p<1,g—1—p. 
2. Let X be an integer-valued RV with PGF P(s). Let a and ( be nonnegative integers, 
and write Y = aX + {. Find the PGF of Y. 


3. Let X be an integer-valued RV with PGF P(s), and suppose that the mgf M(s) exists 
for s € (—so, 50), So > 0. How are M(s) and P(s) related? Using M“ (s)|,—9 = EX* 
for positive integral k, find EX* in terms of the derivatives of P(s) for values of 
k=1,2,3,4. 

4. For the Cauchy PDF 


woOcx<c nw, 


does the MGF exist? 
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5. Let X be an RV with PMF 


PRof=p, 72012: 
Set P{X > j} =q),j=0,1,2,.... Clearly qj =pj+1+pj42 +--+: ,j > 0. Write O(s) = 
>_0 45’. Then the series for Q(s) converges in |s| < 1. Show that 
_ 1-Pis) 


Qs) = 


for |s| <1, 


where P(s) is the PGF of X. Find the mean and the variance of X (when they exist) 
in terms of Q and its derivatives. 


6. For the PMF 


Oi 
PIK=i} = Fy FSO 1c, OO, 


where a; > 0 and f(0) = >°;* aj’, find the PGF and the MGF in terms of f. 


7. For the Laplace PDF . 


1 . 
fa)= ae PHM, —o<x<o; A>ON, -w<p<on, 


show that the MGF exists and equals 
MA=0-MF1M, b= 


8. For any integer-valued RV X, show that 
So s"P{X <n} =(1—s)"'P(s), 
n=0 
where P is the PGF of X. 
9. Let X be an RV with MGF M(t), which exists for t € (—t0,f0), to > 0. Show that 
E|X|" <n!s~"[M(s) +M(-—s)] 


for any fixed s, 0 < s < fo, and for each integer n > 1. Expanding e™ in a power 
series, show that, for t € (—s,s),0<s <t, 


Seed EX” 
M(t) = Sor" re 


n=0 
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10. 


11. 
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(Since a power series can be differentiated term by term within the interval of 
convergence, it follows that for |z| < s, 


M“)(t)|,-9 = EX* 


for each integer k > 1.) (Roy, LePage, and Moore [95]) 
Let X be an integer-valued random variable with 


n 
k! ifk=0,1,2,..., 
ARK a Tv( ARB) ( k ; 7 
0) ifk>n. 


Show that X must be degenerate at n. 
[Hint: Prove and use the fact that if EX* < oo for all k, then 


—(s—1) 
P(s) = }) = E{X(X—1)--- (X-k+ D}. 


Write P(s) as 


Let p(n,k) =f (n,k)/n! where f(n,k) is given by 


for k=0,1,...4( ; ) and 


f(n,k) =0 fork <0,f(1,0) = 1,f(1,k) = 0 otherwise. 


Let 


1 lee) 
Pal) = A 
=0 


be the probability generating function of p(n,k). Show that 
1—s* 
Pfs) = Gl) * || —— <1. 
@=@)"T] 


(P,, is the generating function of Kendall’s 7-statistic.) 
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12. Fork =0,1,..., ( ‘ let u,(k) be defined recursively by 


Un (k) = Un—-1 (k— n) + Un-1 (k) 


with uo(0) = 1, uo(k) = 0 otherwise, and u,(k) = 0 for k < 0. Let P,(s) = 
ee SXun(k) be the generating function of {u,}. Show that 
P,(s)=][G+s/) for |s| <1. 
j=l 
If pa(k) = un(k)/2", find {py(k)} for n = 2,3,4. (P, is the generating function of 
one-sample Wilcoxon test statistic.) 


3.4 SOME MOMENT INEQUALITIES 


In this section we derive some inequalities for moments of an RV. The main result of this 
section is Theorem | (and its corollary), which gives a bound for tail probability in terms 
of some moment of the random variable. 


Theorem 1. Let /(X) be a nonnegative Borel-measurable function of an RV X. If Eh(X) 
exists, then, for every c > 0, 
Eh(X) 


€ 


P{h(X) ze} < 


(1) 


Proof. We prove the result when X is discrete. Let P{X = x,} = px, k = 1,2,.... Then 


= Dher)Pe 
= (+E) Hoon 


where 
A={k:h(x) >}. 


Then 


x)= So h(xe)p = E> px 


= eP{h(X) >}. 


Corollary. Let h(X) = |X|" and ¢ = K’, where r > 0 and K > 0. Then 


tails 


PUN S KY < (2) 
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which is Markov’s inequality. In particular, if we take h(X) = (X — 1)’, ¢ = K’o”, we get 
Chebychev—Bienayme inequality: 


1 
P{|X—p| 2 Ko} < ye, (3) 
where EX = p, var(X) = 0”. 


Remark I. The inequality (3) is generally attributed to Chebychev although recent 
research has shown that credit should also go to LJ. Bienayme. 


Remark 2. If we wish to be consistent with our definition of a DF as F(x) = P(X < x), 
then we may want to reformulate (1) in the following form. 


P{h(X) >} < Eh(X)/e. 


For RVs with finite second-order moments one cannot do better than the inequality in (3). 


Example 1. 
1 
i K > 1, constant, 
P{X =Fl}=— 
{ }= 5 
> 1 1 
EX = 0, EX’ =—, o=—, 
K? K 
1 
P{|X| = Ko} = P(IX|= 1} = a, 


so that equality is achieved. 


Example 2. Let X be distributed with PDF f(x) = 1 if0 <x < 1, and = 0 otherwise. Then 


From Chebychev’s inequality 


1 / 1 1 
P< |X 2 >1 = 0.75. 
{ 2! < he 4 aa 


In Fig. 1 we compare the upper bound for P{|X — 1/2| > k/V12} with the exact 
probability. 

It is possible to improve upon Chebychev’s inequality, at least in some cases, if we 
assume the existence of higher order moments. We need the following lemma. 
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Exact 


Upper bound 


0 1 


Lemma 1. Let X be an RV with EX = 0 and var(X) = 0”. Then 


P{X>x}< 


PIX > a} 2 


oe + 


o2+x 


V3 


Fig. 1 Chebychev upper bound versus exact probability. 


2 


o 


x2 
2 


2 


ifx>0, 


ifx <0, 


Proof. Let h(t) =(t+c)*,c > 0. Then h(t) > 0 for all t and 


h(t) > (x+c)* 
It follows that 


P{X > x} < P{h(X) > (x+c)’} 


= E(X +c)? 
~ (x+e)? 


fort>x>0. 


for allc > 0, 
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(4) 


(5) 


(6) 


Since EX = 0, EX? = o?, and the right side of (6) is minimum when c = o /x, we have 


co 
P{X > x} < ——_{~ 
Kas a5, 


Similar proof holds for (5). 


Remark 3. Inequalities (4) and (5) cannot be improved (Problem 3). 


Theorem 2. Let E|X|* < oo, and let EX = 0, EX” = 07. Then 


pa —o4 


x>0. 


P{|X|>Ko}< 
LL 


where ju4 = EX*. 


4+o0+K* —2K?o4 


for K > 1, 


(7) 
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Proof. For the proof let us substitute (X* — 0”) /(K*o? — 07) for X and take x = 1 in (4). 
Then 


var{ (X? — a”) /(K2a* —a?)} 
+var{(X? — 0?) /(K2a0? —a7)} 


jig — 7" 
~ o4(K?—1)? +44- 04 
pa — 4 
~ tig bot Kt —2K204' 


Pea Sk aa} < i 


K>1, 


as asserted. 


Remark 4. Bound (7) is better than bound (3) if K* > j14/o* and worse if 1 < K* < pug/o4 
(Problem 5). 


Example 3. Let X have the uniform density 


rey={1 if0<x<1, 


OQ otherwise. 


Then 
4 
EX (X) ESX : : 
=S=— a. —- = _- =—_—, 
; var 12’ H4 2 80’ 
and 
ole ole 
beaeva} seb -& 
2 12 30 + 7aq 16 Sigg 49 
that is, 


1 / 1 45 
Ps6\x 2 > ~ 0.92 
| |< ote 8 One 


which is much better than the bound given by Chebychev’s inequality (Example 2). 


Theorem 3 (Lyapunov Inequality). Let 6,, = E|X|" < co. Then for arbitrary k,2<k <n, 
we have 


a ee (8) 


Proof. Consider the quadratic form 


O(u,v) = / (ulx|%—D/? + v]x| FD?) F Od) de, 
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where we have assumed that X is continuous with PDF f. We have 


O(u,v) =u? By + 2uvBy + Beg iv’. 


Clearly Q > 0 for all u,v real. It follows that 


Ber Be +0, 
Be Bet 
implying that 
aes BE Best- 
Thus 


2 lal 4 2 2 2(n—1 —1 an—1 
Cen, Bare... 6% See 


where (9 = 1. Multiplying successive k — | of these, we have 
: = k- k 
Bist or BAY <8. 


It follows that 


fieB <8 << By 
The equality holds if and only if 
B= BS? = fork=1,2,..., 


that is, {B,! ‘\ is a constant sequence of numbers, which happens if and only if |X| is 
degenerate, that is, for some c, P{|X| =c} = 1. 


PROBLEMS 3.4 


1. For the RV with PDF 


F(A) = >=- 
where A > 0 is an integer, show that 


r 
P. X<2 1 —. 
{0<X <2(A+ E> sq 


2. Let X be any RV, and suppose that the MGF of X, M(t) = Ee”, exists for every t > 0. 
Then for any t > 0 


P{tX > s? +logM(t)} < e*. 
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3. Construct an example to show that inequalities (4) and (5) cannot be improved. 
4. Let g(.) be a function satisfying g(x) > 0 for x > 0, g(x) increasing for x > 0, and 


E|g(X)| < co. Show that 


Eg (|X 
P{|X| >e}< Eg(\Xl) for every € > 0. 


g(€) 


. Let X be an RV with EX = 0, var(X) = 07, and EX* = j14. Let K be any positive real 


number. Show that 


1 if K* <1, 
1 7 2 
P{\X|>Ko} <le , if1<K°< &, 
am if K2 > 44. 


pig +04+K* — 2K204 


In other words, show that bound (7) is better than bound (3) if K* > jug / o* and worse 
if 1 < K* < p4/o%*. Construct an example to show that the last inequalities cannot 
be improved. 


. Use Chebychev’s inequality to show that for any k > 1, e+! > k?. 
. For any RV X, show that 


P{X > 0} <inf{p(t) 11> 0} <1, 


where y(t) = Ee, 0 < y(t) < 00. 


. Let X be an RV such that P(a < X < b) = 1 where —oo < a < b < ow. Show that 


var(X) < (b—a)?/4. 


MULTIPLE RANDOM VARIABLES 


4.1 INTRODUCTION 


In many experiments an observation is expressible, not as a single numerical quantity, but 
as a family of several separate numerical quantities. Thus, for example, if a pair of distin- 
guishable dice is tossed, the outcome is a pair (x,y), where x denotes the face value on the 
first die, and y, the face value on the second die. Similarly, to record the height and weight 
of every person in a certain community we need a pair (x,y), where the components repre- 
sent, respectively, the height and weight of a particular individual. To be able to describe 
such experiments mathematically we must study the multidimensional random variables. 

In Section 4.2 we introduce the basic notations involved and study joint, marginal, 
and conditional distributions. In Section 4.3 we examine independent random variables 
and investigate some consequences of independence. Section 4.4 deals with functions of 
several random variables and their induced distributions. Section 4.5 considers moments, 
covariance, and correlation, and in Section 4.6 we study conditional expectation. The last 
section deals with ordered observations. 


4.2 MULTIPLE RANDOM VARIABLES 


In this section we study multidimensional RVs. Let (Q,5,P) be a fixed but otherwise 
arbitrary probability space. 
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Definition 1. The collection X = (X,,X>,...,X,,) defined on (0,8, P) into R, by 
X(w) = (Xi (w),X2(w),...,Xn(w)), weEQ, 
is called an n-dimensional RV if the inverse image of every n-dimensional interval 
T= {(x1,%2,---,Xn)! —0O <x; < aj,a; € R,i=1,2,...,n} 


is also in 9, that is, if 


X71 (1) = fw: Xy(w) <ay,...,Xn(w) <a,} €8 for a; € R. 


Theorem 1. Let X),X2,...,X, be n RVs on (©,8,P). Then X = (X,,X2,...,X,) is an 
n-dimensional RV on (2,8, P). 


Proof. Let I = {(x1,%2,...,%n): —0o <x; <a, i=1,2,...,n}. Then 


{(X1,Xo,...,Xn) ET} = {w: Xy(w) < ay, X2(w) < a,...,Xn(w) < an} 


= (fw: Xi (w) <a} € 8 
k=l 


as asserted. 


From now on we will restrict attention mainly to two-dimensional random variables. 
The discussion for the n-dimensional (n > 2) case is similar except when indicated. The 
development follows closely the one-dimensional case. 


Definition 2. The function F(-,-), defined by 
F(x,y) = P{X <x,¥ <y}, all (x,y) € Ro, (1) 


is known as the DF of the RV (X,Y). 
Following the discussion in Section 2.3, it is easily shown that 


(i) F(x,y) is nondecreasing and continuous from the right with respect to each 
coordinate and 
(ii) lim F(x,y) = F(+c0,+c0) = 1, 
x—>-+00 


y +00 


lim F(x,y) = F(x,—00) =0 for all x, 


y—0o 


lim F(x,y) = F(—oo,y) =0 for all y. 
x—>— 0O 
But (i) and (ii) are not sufficient conditions to make any function F(-,-) a DF. 


Example 1. Let F be a function (Fig. 1) of two variables defined by 


1, otherwise. 


? 


0, x<Oorx+y<lory<0O, 
Fess) ={ 
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eV 


Fig. 1 


Then F satisfies both (i) and (ii) above. However, F is not a DF since 
PU SASL <P SSH) Pie) AG) 
=1+0-1-1=-1F0. 
Let x; < x2 and y; < y2. We have 


P{x) <X <x, <¥ Sy} = P{X Sw, ¥ < yop t+P{X Su, ¥< yi} 
=P {XxX SK, Y Sys P(X S99, F Sit 
= F(%2,y2) + F(x, y1) — F(x1,y2) — F(2,91) 
>0 


for all pairs (x1,y1), (x2, y2) with x1 < x2, y1 < y2 (see Fig. 2). 
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Theorem 2. A function F of two variables is a DF of some two-dimensional RV if and 


only if it satisfies the following conditions: 


(i) F is nondecreasing and right continuous with respect to both arguments; 
(ii) F(—co,y) = F(x, -00) = 0 and F(+00, +00) = 1; and 
(iii) for every (x1,y1), (x2,y2) with x1 <x. and y; < y2 the inequality 
F(x2,y2) — F(x2,y1) +F(x1,y1) — F(x1,y2) 2 0 


holds. 


(2) 


The “if” part of the theorem has already been established. The “only if” part will not 


be proved here (see Tucker [114, p. 26]). 


Theorem 2 can be generalized to the n-dimensional case in the following manner. 
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(1,2) (%2,Y2) 


(x.y) (*2,y1) 


uv 


0 


Fig.2) {x1 <<x<m,y1 <y<yp}. 


Theorem 3. A function F(x1,x2,...,X,) is the joint DF of some n-dimensional RV if and 
only if F is nondecreasing and continuous from the right with respect to all the arguments 
X1,X2,-..-,X, and satisfies the following conditions: 


@ F(—00,%2,...,Xn) = F(x1,-00,.%3,...,%n) °° 
= F(x1,.--,;Xn—-1, 00) = 0, 
F(+00, +00,..., too) = 1. 


(ii) For every (x,,X2,---,Xn) € Ry and all ¢; > O(i = 1,2,...,n) the inequality 


F(x, +€1,X2 + €2,---;Xn $En) 


n 

= 5 FGa eyes at PEE AE ela iage Sa) 
i=1 
n 


+ s Fixit Peiiondt 4 G10) ae ea ee ee 


ij=l 
i<j 
De ae a eee ee 
+ (-1)"F(x1,%2,---,X%n) > 0 (3) 


holds. 


We restrict ourselves here to two-dimensional RVs of the discrete or the continuous 
type, which we now define. 
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Definition 3. A two-dimensional (or bivariate) RV (X, Y) is said to be of the discrete type 
if it takes on pairs of values belonging to a countable set of pairs A with probability 1. We 
call every pair (x;,;) that is assumed with positive probability pj; a jump point of the DF 
of (X, Y) and call p;; the jump at (x;,y;). Here A is the support of the distribution of (X, Y). 


Clearly >/,,pij = 1. As for the DF of (X,Y), we have 
F(x,y) = >_ py, 
B 


where B = {(i,j): xj < x,y; <y}. 


Definition 4. Let (X,Y) be an RV of the discrete type that takes on pairs of values 
(xi,y;),6=1,2,..., andj = 1,2,.... We call 

pg =P{X =x, ¥ =y;}, PH 1,2) c0e5 J = Lys 
the joint probability mass function (PMF) of (X,Y). 


Example 2. A fair die is rolled, and a fair coin is tossed independently. Let X be the face 
value on the die, and let Y = 0 if a tail turns up and Y = 1 if a head turns up. Then 


A = {(1,0),(2,0),..., (6,0), (1,1), (2,1),..., (6, 1)}, 


1 
Pi= 75 fori=1,2,...,6; 7=0,1. 


The DF of (X,Y) is given by 


0, x<l-o<y<amj-w<x<w,y<0, 

1 

—, l< 2,0< 1 

12’ SxX< F sy< b 

1 

6 2<x<3,0<y<ll<x<2,1l<y, 

1 

mt 3<x<4,0<y<l, 

1 

3 4<x<5,0<y<12<x<3,1<y, 
F(x,y) = 5 

—_ < < 1 

DD’ 5<x<6,0<y<l, 

1 

oY 6<x,0<y<13<%<4,1<y, 

2 

37 4<x<5,1<y, 

5 

6’ 5<x<6,1<y, 

1, 6<x,1<y. 
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Theorem 4. A collection of nonnegative numbers {pj: i= 1,2,...;j = 1,2,...} satisfy- 
ing )7; 1 Pj = | is the PMF of some RV. 


Proof. The proof of Theorem 4 is easy to construct with the help of Theorem 2. 


Definition 5. A two-dimensional RV (X,Y) is said to be of the continuous type if there 
exists a nonnegative function f(-,-) such that for every pair (x,y) € R2 we have 


F(x,y) = / 7 | i. . fluvar du, (4) 


where F is the DF of (X,Y). The function f is called the (joint) PDF of (X,Y). 
Clearly, 


F(+00, +00) = tim if [. f(u,v)dvdu 
ees 


= iz f(u,v) dvdu=1. 


If f is continuous at (x,y), then 


#F (x,y) 


Pedy TS (%Y): (9) 


Example 3. Let (X,Y) be an RV with joint PDF (Fig. 3) given by 


e&4)) O<x<00, 0<y<oo, 
f(x,y) = 
0, otherwise. 


Then 


F(xy) =U -e IU -e), O<x<00, O<y<ov, 
“ 0, otherwise. 


Theorem 5. If f is a nonnegative function satisfying [°° [°~. f(x,y) dedy = 1, then f is 
the joint density function of some RV. 


Proof. For the proof define 


F(x,y) = / - | Fa Z f(u,v) a du 


and use Theorem 2. 
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Fig.3 f(x,y) =exp{—(x+y)},x>0,y>0. 


Let (X, Y) be a two-dimensional RV with PMF 
Pi = P{X = Xi, y = yj}. 


Then 


Co foe) 
Sa Pike ay aria) 
i=l] i=1 


and 


co co 
> pe > P(X =m P—y} =P (X= a}. 
j=l j=l 


Let us write 


ee) lee) 
Di- — and pee) Pe 
j=l i=1 
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(6) 


(7) 


(8) 


Then p;. > 0 and ));=, pi. = 1,p.j > O and Y=, pj = 1, and {p;.}, {pj} represent PMFs. 


Definition 6. The collection of numbers {p;.} is called the marginal PMF of X, and the 


collection {p.;}, the marginal PMF of Y. 
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Example 4. A fair coin is tossed three times. Let X = number of heads in three tossings, 
and Y = difference, in absolute value, between number of heads and number of tails. The 
joint PMF of (X, Y) is given in the following table: 


pw |o 1 2 3] Pir=p} 
1 0 i 2 0 g 
3 37 0 0 3 2 

Pix=y[t 2 2 if 1 


The marginal PMF of Y is shown in the column representing row totals, and the marginal 
PME of X, in the row representing column totals. 
If (X,Y) is an RV of the continuous type with PDF f, then 


A= / Fone (0) 
and 
fib) = / Flxy) ae (10) 


satisfy fi(x) > 0, f(y) > 0, and f° fi(x) dx = 1, f°. f(y) dy = 1. It follows that f; (x) 
and f(y) are PDFs. 


Definition 7. The functions f| (x) and fo (y), defined in (9) and (10), are called the marginal 
PDF of X and the marginal PDF of Y, respectively. 


Example 5. Let (X,Y) be jointly distributed with PDF f(x,y) =2,0<x<y< 1,and,=0 
otherwise (Fig. 4). Then 


2—2x, 0<x<1 
0, otherwise 


and 


0, otherwise 


» 2y, O<y<l 
fy) = | 2in={ ‘ 
0 


are the two marginal density functions. 
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fixy)=2 


«Vv 


0 1 
Fig.4 f(x,y) =2,0<x<y<l. 


Definition 8. Let (X,Y) be an RV with DF F. Then the marginal DF of X is defined by 


F(x) = F(x,00) = lim F (x,y) (11) 


_ | YegePe if (X,Y) is discrete, 
| ffi dt if (X,Y) is continuous. 


A similar definition is given for the marginal DF of Y. 


In general, given a DF F(x1,x2,...,%,) of an n-dimensional RV (X|,X2,...,Xn), one 
can obtain any k-dimensional (1 < k <n—1) marginal DF from it. Thus the marginal DF 
of (X;,,Xj,,-.-Xj,), where 1 <i) < ip < +++ <i, <n, is given by 


= F(+00,...,+00,%;,, +00,..., +00,...,%;,, FO0,..., +00). 


We now consider the concept of conditional distributions. Let (X,Y) be an RV of the 
discrete type with PMF p; = P{X = x;, Y = y;}. The marginal PMFs are p;. = ee , and 
pji= goat pi. Recall that, if A,B € & and PB > 0, the conditional probability of A, given 
B, is defined by 


P{A|B) =. 
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Take A = {X =x;} = { (xi, y): —oo <y < co} and B= {Y = y;} = { (x,y); co < x < o0}, 
and assume that PB = P{Y =y;} =p. > 0. ThenAN B= {X =x;,Y =y;} and 

Pij 

PiA |B} = P(X =x, | Y =y,} a 

ij 


For fixed j, the function P{X = x; | Y = yj} > 0 and 57°, P{X =x; | Y = y;} = 1. Thus 
P{X =x; | Y =y;}, for fixed j, defines a PMF. 

Definition 9. Let (X,Y) be an RV of the discrete type. If P{Y = y;} > 0, the function 
P{X =x;,Y =y;} 
ELray| 
for fixed j is known as the conditional PMF of X, given Y = y;. A similar definition 


is given for P{Y = y; | X = x;}, the conditional PMF of Y, given X = x;, provided that 
P{X — xi} > 0. 


P{X =x; |¥=y;}= (12) 


Example 6. For the joint PMF of Example 4, we have for Y = 1 


0, i=0,3, 
P{X =i|Y=1}= l 
2 i= 1,2: 
Similarly 
~, ifi=0,3, 
Pixsijya3yeda° *~™ 
Q, if i =1,2, 
0, ifj=1 
Piv=j|x=0ja¢"° 77 
1, ifj=3, 
and so on. 


Next suppose that (X,Y) is an RV of the continuous type with joint PDF f. Since 
P{X =x} =0, P{Y = y} =0 for any x,y, the probability P{X <x | Y =y} or P{Y < 
y | X = x} is not defined. Let ¢ > 0, and suppose that P{y—e < Y < y+e} > 0. For 
every x and every interval (y —¢,y+¢], consider the conditional probability of the event 
{X <x}, given that Y € (y—e,y+e]. We have 


P{X <x,y-e<Y¥<yte} 


P{X <x|y-e<Y¥<y+teh= Py ep-eyee 


For any fixed interval (y—¢,y +], the above expression defines the conditional DF of X, 
given that Y € (y—e,y+e], provided that P{Y € (y—e, y+e]} > 0. We shall be interested 
in the case where the limit 


lim P{X <x|Y €(y—e,y+e]} 
e0+ 


exists. 
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Definition 10. The conditional DF of an RV X, given Y = y, is defined as the limit 


lim P{X <x|Ye€(y-«e,y+e]}, (13) 
e—0+ 


provided that the limit exists. If the limit exists, we denote it by Fy)y(x | y) and define 
the conditional density function of X, given ¥Y = y, fyjy(x | y), as a nonnegative function 
satisfying 


Fyyy(x | y) => / fxjy(t | y) dt for all x € Te; (14) 


For fixed y, we see that fyy(x | y) >Oand f°. fxy(x|y) dx = 1. Thus fyjy (x | y) is a PDF 
for fixed y. 

Suppose that (X,Y) is an RV of the continuous type with PDF f. At every point (x,y) 
where f is continuous and the marginal PDF f(y) > 0 and is continuous, we have 


PIX <x, Y _ 
ee earn {X <x, Ve y—eytel} 
e30+ P{Y€(y—e,y+e]} 


- { [°F F(u,v)av} du 
= Paar 8 LAM) dy : 


Dividing numerator and denominator by 2¢ and passing to the limit as ¢ + 0+, we have 


Sof (uy) du 
A) 


a {Head au 
—oo h (y) 
It follows that there exists a conditional PDF of X, given Y = y, that is expressed by 


f(x,y) 
faly) ’ 


We have thus proved the following theorem. 


Fxjy(x | y) = 


fxy(x|y) = 


fly) > 0. 


Theorem 6. Let f be the PDF of an RV (X,Y) of the continuous type, and let f, be the 
marginal PDF of Y. At every point (x,y) at which f is continuous and f)(y) > 0 and is 
continuous, the conditional PDF of X, given Y = y, exists and is expressed by 


f(x,y) 
fly) 


fay(®|y¥) = (15) 
Note that 


/ * finde AOaely, 
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so that 


Fi) =f { ic au ay=[ AL)Frvcely)d, 16 


where F’; is the marginal DF of X. 


It is clear that similar definitions may be made for the conditional DF and conditional 
PDF of the RV Y, given X = x, and an analog of Theorem 6 holds. 

In the general case, let (X1,X2,...,X,) be an n-dimensional RV of the continuous type 
with PDF fx, ,x,,...,.x,(41,2,---,%n). Also, let {i <n < +++ <iky jy <jo< +++ <ji} bea 
subset of {1,2,...,n}. Then 


F (Xj, Xing Xi Eee, creme (17) 
Xiy Xik k 
_ eo tine wx TXit Xin Xu stat Xy (Ui, gree Wig Xjpye+s Xj) aa duj, 
— [ove] [ove] k 
—_ — Jo Pking Xe Xn eoesXp May pee Ui Xfire Xj) ees du;,, 
provided that the denominator exceeds 0. Here ins je SE iS is the joint marginal PDF 
of (X;,,Xi,,.--,Xi,,Xj,,Xj.,---,X;,). The conditional densities are obtained in a similar 
manner. 


The case in which (X1,X2,...,X,) is of the discrete type is similarly treated. 


Example 7. For the joint PDF of Example 5 we have 
fy) 1 


frix(y |x) = = , x<y<l, 


filx) 1-x 


so that the conditional PDF fy,y is uniform on (x, 1). Also, 


1 
Farle |y) = 5; O<x<y, 


which is uniform on (0,y). Thus 


We conclude this section with a discussion of a technique called truncation. We con- 
sider two types of truncation each with a different objective. In probabilistic modeling we 
use truncated distributions when sampling from an incomplete population. 


Definition 11. Let X be an RV on (2,8,P), and T € B such that 0 < P{X € T} <1. 
Then the conditional distribution P{X < x | X € T}, defined for any real x, is called the 
truncated distribution of X. 
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If X is a discrete RV with PMF p; = P{X =x;}, i= 1,2,..., the truncated distribution 
of X is given by 


Pi 


PiX=x,.XET —=—  ifx, €T, 
PIX =x |X ET} = P fe ry - Layer Pi (18) 
{xe T} 0 otherwise. 
If X is of the continuous type with PDF f, then 
P{X<x,XET —co,xjart (y) dy 
PIX<2|xeT} = AX SaXE } Scour (19) 
P{X ET} Jf) ay 
The PDF of the truncated distribution is given by 

mei » wel. 
h(x) = 4 Ipf)ay (20) 

0, xEZT. 


Here T is not necessarily a bounded set of real numbers. If we write Y for the RV with 
distribution function P{X < x | X € T}, then Y has support T. 
Example 8. Let X be an RV with standard normal PDF 


1 2/2 


SQ) =e" 


Let T = (—oo,0]. Then P{X € T} = 1/2, since X is symmetric and continuous. For the 
truncated PDF, we have 


Some other examples are the truncated Poisson distribution 


er xk 


P{X=k}=7— ay k=1,2,..., 


where T = {X > 1}, and the truncated uniform distribution 
f(x) =1/0, O0<x<6, and =0 otherwise, 
where T = {X < 0},0>0. 


The second type of truncation is very useful in probability limit theory specially when 
the DF F in question does not have a finite mean. Let a < b be finite real numbers. Define 
RV X* by 


yr J% ifasx<p 
~ 10, ifX <a,orX>b. 
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This method produces an RV for which P(a < X* <b) = 1 so that X* has moments of all 
orders. The special case when b = c > 0 and a = —c is quite useful in probability limit 
theory when we wish to approximate X through bounded rvs. We say that X° is X truncated 
at c if X° =X for |X| <c, and = 0 for |X| > c. Then E|X°|* < c*. Moreover, 

P{X £ X°} = P{|X| > c} 


so that c can be selected sufficiently large to make P{|X| > c} arbitrarily small. For 
example, if E|X|? < oo then 


P{\X|>c} < E|X??/c? 


and given ¢ > 0, we can choose c such that E|X|?/c? <e. 
The distribution of X° is no longer the truncated distribution P{X < x | |X| < c}. In fact, 


0, y<-e 

Fy) = F(y) —F(-c), —c<y<0 
1—F(c)+F(y), O<y<e 
1 y>c, 


where F is the DF of X and F’, that of X°. 
A third type of truncation, sometimes called Winsorization, sets 


X* =X, ifa<X <b, =a ifX <a, and =b ifX>b. 


This method also produces an RV for which P(a < X* < b) = 1, moments of all orders 
for X* exist but its DF is given by 


F*(y) =0 fory <a, =F(y) fora<y<b, =1 fory>b. 


PROBLEMS 4.2 


1. Let F(x,y) = 1 if x+2y > 1, and = 0 if x+2y < 1. Does F define a DF in the 
plane? 

2. Let T be a closed triangle in the plane with vertices (0,0), (0, 2), and (V2, V2). 
Let F(x,y) denote the elementary area of the intersection of T with {(x1,22): x1 < 
X,X2 < y}. Show that F defines a DF in the plane, and find its marginal DFs. 

3. Let (X,Y) have the joint PDF f defined by f(x,y) = 1/2 inside the square with 
corners at the points (1,0), (0,1), (—1,0), and (0, —1) in the (x, y)-plane, and = 0 
otherwise. Find the marginal PDFs of X and Y and the two conditional PDFs. 

4. Let f(x,y,z) =e", x > 0, y > 0, z > 0, and = 0 otherwise, be the joint PDF 
of (X,Y,Z). Compute P{X < Y < Z} and P{X =Y < Z}. 

5. Let (X,Y) have the joint PDF f(x,y) = $[xy + (x?/2)] if0<x<1,0<y<2,and 
= 0 otherwise. Find P{Y <1|X < 1/2}. 
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6. For DFs FF), Fo,...,F, show that 


= = < < 
1 yi-F i(xi)} < F(%1,x2,.--,Xn) < —_ F;(x;) 


i<n 


for all real numbers x; ,x2,...,X, if and only if F;’s are marginal DFs of F. 
7. For the bivariate negative binomial distribution 


(x+y+k-1)! : 
PIR= 2 Y=y} = gay Paral Ps — Pay 


where x,y =0,1,2,...,k > 1 is an integer, 0 <p; <1,0<po < 1l,andp; +p2 < 1, 
find the marginal PMFs of X and Y and the conditional distributions. 


In Problems 8-10 the bivariate distributions considered are not unique generalizations 
of the corresponding univariate distributions. 


8. For the bivariate Cauchy RV (X,Y) with PDF 
f(x,y) = sl +22 +9")? 00 <x<00,-00 <y<oo,c>0, 
ag 


find the marginal PDFs of X and Y. Find the conditional PDF of Y given X = x. 
9. For the bivariate beta RV (X,Y) with PDF 


T'(p1 +p2+Pp3) 
P(p1)P(p2) V's) 


x>0,y>0,.x+y< 1, 


fey) = PLN yy} 


where p1,p2,)3 are positive real numbers, find the marginal PDFs of X and Y and 
the conditional PDFs. Find also the conditional PDF of Y/(1—X), given X = x. 


10. For the bivariate gamma RV (X,Y) with PDF 
gern 
Pa)P(q) 
find the marginal PDFs of X and Y and the conditional PDFs. Also, find the con- 
ditional PDF of Y — X, given X = x, and the conditional distribution of X/Y, given 
Y=y. 

11. For the bivariate hypergeometric RV (X,Y) with PMF 


rconrese(")(H)(H) (I) 


K=O, 250505 N; 


f(xy) = xh (yx) he, 0<x<y;,a,8,y>0, 


where x < Np, y < Npo, n—x—y < N(1—p, —p2), N,n integers with n < N, and 
0<p; <1,0<p2 <1 so that p; + p2 < 1, find the marginal PMFs of X and Y and 
the conditional PMFs. 
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12 


13. 


14. 


15. 


16. 


17. 
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Let X be an RV with PDF f(x) = 1 if 0 <x < 1, and = 0 otherwise. Let T = 
{x: 1/3 <x < 1/2}. Find the PDF of the truncated distribution of X, its means, 
and its variance. 


Let X be an RV with PMF 
Dad 
P{X=x} =e , $=0,1,214,A 20 
x! 


Suppose that the value x = 0 cannot be observed. Find the PMF of the truncated 
RY, its mean, and its variance. 


Is the function 


exp(—u), O0<x<y<z<u<oo 
f(x,y,Z,4) = 
0 elsewhere 


a joint density function? If so, find P(X <7), where (X,Y,Z,U) is a random 
variable with density f. 
Show that the function defined by 

24 
1+x+y+z+u) 


FlBy zu) = 7 5 x>0,y>0,z>0,u>0 


and 0 elsewhere is a joint density function. 

(a) Find P(X >Y<Z>U). 

(b) Find P(X+Y+Z+U>1). 

Let (X,Y) have joint density function f and joint distribution function F. Suppose 
that 


f(x1,y1)f G2, 92) Sf(1,92)f 2,91) 
holds for x; <a < x2 andy; <b < yo. Show that 
F(a,b) < F\(a)F2(b). 
Suppose (X, Y,Z) are jointly distributed with density 


x z), x>0,y>0,z>0 
foaxai= g(x)g(y)g(z) 
0) elsewhere. 


Find P(X > Y > Z). Hence find the probability that (x,y,z) € {X > Y > Z} or 
{X < Y < Z}. (Here g is density function on R.) 


4.3 INDEPENDENT RANDOM VARIABLES 


We recall that the joint distribution of a multiple RV uniquely determines the marginal 
distributions of the component random variables, but, in general, knowledge of marginal 
distributions is not enough to determine the joint distribution. Indeed, it is quite possible 
to have an infinite collection of joint densities f,, with given marginal densities. 
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Example 1. (Gumbel [39]). Let ff, be three PDFs with corresponding DFs F), F2, F3, 
and let a be a constant, |a| < 1. Define 


feu (%12,%3) =i (1 fa (22) fa (23) 
{1+ a[2F; (x1) — 1][2F2(x2) — 1][2F3 (x3) — 1}. 


We show that Fy is a PDF for each a in [—1,1] and that the collection of densities 
{fa;—1<ca< 1} has the same marginal densities f, ,fo,/;. First note that 


|[2F (x1) — 1][2F2 (x2) — 1[2F3(a3) — | <1, 
so that 
1+ a[2F; (x1) — 1 [2F2(x2) — 1] [2F3(x3) — 1] = 0. 


Also, 
i / for (1,%2,.%3) Ax dxy dx3 
=l+a ( fern) — ihn) n ) ( f Fete) - falta) de 


; (f2rsos) = Iflea) as 


= 1+af[FPO)|",, — NEF @2)|%,, - UFR 6s)|%, 1 
=1. 


It follows that fy is a density function. That f, ,/o,/ are the marginal densities of f,, follows 
similarly. 


In this section we deal with a very special class of distributions in which the marginal 
distributions uniquely determine the joint distribution of a multiple RV. First we consider 
the bivariate case. 

Let F(x,y) and F)(x), F2(y), respectively, be the joint DF of (X,Y) and the marginal 
DFs of X and Y. 


Definition 1. We say that X and Y are independent if and only if 
F (x,y) = F\(x)Fo(y) for all (x,y) € Ro. (1) 
Lemma 1. If X and Y are independent and a < c,b < d are real numbers, then 
Pla<X<c,b<Y¥<d}=P{a<X<c}P{b<Y<d}. (2) 


Theorem 1. (a) A necessary and sufficient condition for RVs X, Y of the discrete type 
to be independent is that 


PIX =x ¥ =y} = PIX =aPLY = yh (3) 


116 MULTIPLE RANDOM VARIABLES 


for all pairs (x;,y;). (b) Two RVs X and Y of the continuous type are independent if 
and only if 


f(xy) =fix\f(y) for all (x,y) € Ro, (4) 


where ff, ,/2, respectively, are the joint and marginal densities of X and Y, and f is 
everywhere continuous. 


Proof. (a) Let X,Y be independent. Then from Lemma 1, letting a — c and b + d, we 
get 


P{X =c,Y =d} = P{X =c}P{Y =d}. 


Conversely, 
F(x,y) = > P{X =, ¥ =yj}, 
B 
where 
B= {(i,j): Xj < Xi Yj < y}. 
Then 


F(x,y) = - P{X = x;} P{Y =y;} 
= 71> Ply =y}] PAX =x} = F(x) FO). 


XX YiSy 
The proof of part (b) is left as an exercise. 


Corollary. Let X and Y be independent RVs. Then Fyjx(y | x) = Fy(y) for all y, and 
Fyyy(x | y) = Fx(x) for all x. 


Theorem 2. The RVs X and Y are independent if and only if 
P{X €A1,Y € An} = P{X € Al} PLY € Ao} (5) 
for all Borel sets A; on the x-axis and A> on the y-axis. 


Theorem 3. Let X and Y be independent RVs and f and g be Borel-measurable functions. 
Then f(X) and g(Y) are also independent. 
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Proof. We have 


P{f(X) <x,9(Y) <y} = P{X Ef! (—co, x], YE g7!(—00,y]} 
= P{X €f—!(—00,x]} P{Y € g7'(—00,y]} 
= Pif(X) <x} P{g(Y) <y}. 


Note that a degenerate RV is independent of any RV. 
Example 2. Let X and Y be jointly distributed with PDF 
1+xy 


f= 4’ 


0, otherwise. 


In} < 1 pl <1, 


Then X and Y are not independent since fj (x) = 1/2, |x| < 1, and f(y) = 1/2,|y| < 1 are the 
marginal densities of X and Y, respectively. However, the RVs X* and Y? are independent. 
Indeed, 


P{X? <u,Y’? <v}= [. f. Sees 
if 2) J git? 


ul/ 


= ie [firma d 


— i/2y1/2 


=Pix <ul Ply <y). 


Note that ¢(X?) and ~(¥*) are independent where ¢ and 7 are Borel-measurable 
functions. But X is not a Borel-measurable function of X*. 


Example 3. We return to Buffon’s needle problem, discussed in Examples 1.2.9 and 1.3.7. 
Suppose that the RV R, which represents the distance from the center of the needle to the 
nearest line, is uniformly distributed on (0,/]. Suppose further that ©, the angle that the 
needle forms with this line, is uniformly distributed on [0,7). If R and © are assumed to 
be independent, the joint PDF is given by 


1 1 
-—-— if0<r<10<z7, 
la 

0 otherwise. 


fro(1,9) = fr(r)fo(9) = 


The needle will intersect the nearest line if and only if 


l 
7 sin >R. 
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Therefore, the required probability is given by 


(4) sind 
{sind > > *\ = [ [ fr.o(r, 0) drdd 
== ~si g=-. 
ae 5 sind - 


Definition 2. A collection of jointly distributed RVs X|,X2,...,X, 1s said to be mutually 
or completely independent if and only if 


F(x1,%2,---,%n) = [[ £60. for all (41,%2,...,%n) © Rn, (6) 


where F is the joint DF of (X\,X2,...,X,), and F;(i = 1,2,...,) is the marginal DF of 
Xj. X1,...,Xn, which are said to be pairwise independent if and only if every pair of them 
are independent. 


It is clear that an analog of Theorem | holds, but we leave the reader to construct it. 


Example 4. In Example 1 we cannot write 


Foe (1 25.3) = fi (21 fa (x2 )fa (x3) 


except when a = 0. It follows that X,, X>, and X3 are not independent except when a = 0. 
The following result is easy to prove. 


Theorem 4. If X,,Xz,...,X, are independent, every subcollection X;,,X;,,...,X;, of 
X1,X2,..-,Xy is also independent. 


Remark I. It is quite possible for RVs X1,X2,...X, to be pairwise independent without 
being mutually independent. Let (X,Y,Z) have the joint PMF defined by 


. if (x,y,z) € {(0,0,0), (0, 1,1), 
(1,0, 1), (1,1,0)}, 

2 if (x,y,z) € {(0,0,1),(0,1,0). 
(1,0,0),(1,1,1)}. 


P{X =x,Y=y,Z =z} 
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Clearly, X, Y, Z are not independent (why?). We have 


PIX=xY=y}= 7, (59) € {(0.0),0,1),(1,0), (1) 
PLY =y.Z=d=q (042) € {(0,0),(0,1),(1,0),(1,0)} 
PIX=x.Z=2= 5, (2) € {(0,0),(0,1),(1,0),(1,1)} 
P{X=3}=5, x=0,x=1, 
P(Y=y}=5, y= Oya 
PiZ==5, 2=0,2=1 


It follows that X and Y, Y and Z, and X and Z are pairwise independent. 


Definition 3. A sequence {X,,} of RVs is said to be independent if for every n = 2,3,4,... 
the RVs X,,X2,...,X, are independent. 


Similarly, one can speak of an independent family of RVs. 


Definition 4. We say that RVs X and Y are identically distributed if X and Y have the 
same DF, that is, 


Fy (x) = Fy(x) for allxE R 
where Fy and Fy are the DF’s of X and Y, respectively. 
Definition 5. We say that {X,,} is a sequence of independent, identically distributed 
(iid) RVs with common law £(X) if {X,} is an independent sequence of RVs and the 
distribution of X,(n = 1,2...) is the same as that of X. 

According to Definition 4, X and Y are identically distributed if and only if they have 
the same distribution. It does not follow that X = Y with probability 1 (see Problem 7). If 
P{X = Y} = 1, we say that X and Y are equivalent RVs. All Definition 4 says is that X 
and Y are identically distributed if and only if 

P{X €A} =P{Y EA} forallA cB. 
Nothing is said about the equality of events {X € A} and {Y € A}. 


Definition 6. Two multiple RVs (X1,X2,...,Xm) and (¥1,¥2,...,¥,) are said to be 
independent if 


F(x1,%2, +++, Xmy V1, V2+ ++ sn) = F\(x1,%2,.-.,Xm)F2(y1,Y2,---;Yn) (7) 
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for all (X1,X2,---,%msV1;Y2.-++>)n) © Rmin, where F, F|, Fz are the joint distribution func- 
tions of (X1,X2,..-,Xm, V1, Y2,---; Yn), (X1,X2,---,Xm), and (VY, Y2,...,¥,), respectively. 


Of course, the independence of (X1,X2,...,Xm) and (Y, Y2,...,Y,) does not imply the 
independence of components X),X2,...,Xm of X or components Y, Y2,...,Y, of Y. 


Theorem 5. Let X = (X1,X2,...,Xm) and Y = (¥1,¥o,...,¥Yn) be independent RVs. 
Then the component X; of X(j = 1,2,...,m) and the component Y; of Y(k = 1,2,...,n) 
are independent RVs. If h and g are Borel-measurable functions, h(X,,X2,...,Xm) and 


g(V1, Yo,...,Yn) are independent. 


Remark 2. It is possible that an RV X may be independent of Y and also of Z, but X may 
not be independent of the random vector (Y,Z). See the example in Remark 1. 


Let X,,X2,...,X, be independent and identically distributed RVs with common DF F. 
Then the joint DF G of (X),X2,...,X;) is given by 


n 
G(x1,X2,---5Xn) = [[£e@. 
j=l 
We note that for any of the n! permutations (x;,,x;,,...,x;,) Of (x1,2,---,Xn) 


G(x1,42,---;Xn) = [[£@,) = G(X;j, ,Xin5- ++ 5 Xj,) 


j=l 


so that Gis asymmetric function of x1 ,x2,...,%,. Thus (X;,X2,...,X,) = (X; 
where X “ Y means that X and Y are identically distributed RVs. 


Xin,--+,Xi,)s 


1? 


Definition 7. The RVs X,,X2,...,X, are said to be exchangeable if 
d 
(X1,Xo,...,Xn) = (Xi, ,Xi,.--,Xi,) 


for all n! permutations (i), i2,...,i,) of (1,2,...,2). The RVs in the sequence {X,,} are said 
to be exchangeable if X),X2,...,X, are exchangeable for each n. 


Clearly if X,,X2,...,X, are exchangeable, then X; are identically distributed but not 
necessarily independent. 


Example 5. Suppose X, Y, Z have joint PDF 


F(xt+ty+z), 0<x<1,0<y<10<z<1 


0, otherwise. 


F932) = 


Then X, Y, Z are exchangeable but not independent. 
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Example 6. Let X,,X2,...,X, be iid RVs. Let S, = yi n= 1,2,... and ¥, = 
X, —S,/n, k =1,2,...,n—1. Then ¥1, Y2,...,¥,—1 are exchangeable. 


Theorem 6. Let X, Y be exchangeable RVs. Then X — Y has a symmetric distribution. 


Definition 8. Let X be an RV, and let X’ be an RV that is independent of X and X’ ay 
We call the RV 


xX'=X-X' 
the symmetrized X. 


In view of Theorem 6, X* is symmetric about 0 so that 
1 1 
P{X* > 0} > 5 and P{xX* <0} > 5° 
If E|X| < oo, then E|X*| < 2E|X| < co, and EX* = 0. 
The technique of symmetrization is an important tool in the study of probability limit 
theorems. We will need the following result later. The proof is left to the reader. 


Theorem 7. For < > 0, 


(a) P{|X*| >e} <2P{|X| >e/2}. 
(b) If a> 0 such that P{X > a} < 1—p and P{X < —a} < 1—p, then 


P{|X*| > e} > P{|X| > ate}, 


fore > 0. 


PROBLEMS 4.3 


1. Let A be a set of k numbers, and 2 be the set of all ordered samples of size n from 
A with replacement. Also, let S be the set of all subsets of 2, and P be a probability 
defined on 8. Let X,,X2,...,X, be RVs defined on (8, P) by setting 


Xi(a1,d2,---,4n) = di, (= 1,2502.47). 
Show that X), X>,...,X;,, are independent if and only if each sample point is equally 


likely. 
2. Let X,,X> be iid RVs with common PMF 


P{X =+1l}= - 


Write X3; = X,X2. Show that X1,X2,X3 are pairwise independent but not 
independent. 
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10. 


11. 
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. Let (X;,X2,X3) be an RV with joint PMF 


F(%1,%2,%3) = 3 if (x1,42,%3) EA, 
=0 otherwise, 
where 


A= {(1,0,0), (0, 1,0), (0,0, 1), (1,1, 1)}. 


Are X,X2,X3 independent? Are Xj, X2,X3 pairwise independent? Are X; + X2 and 
X3 independent? 


. Let X and Y be independent RVs such that XY is degenerate at c £ 0. That is, 


P(XY =c) = 1. Show that X and Y are also degenerate. 


. Let (0,8,P) be a probability space and A,B € 8. Define X and Y so that 


X(w) = I4(w), Y(w) = Ip(w) for allw EQ. 


Show that X and Y are independent if and only if A and B are independent. 


. Let X1,X2,...,X;, be a set of exchangeable RVs. Then 


RA Bey k 
ae 2 iN l<k<n. 


Xi + Xt FX, Sa’ 


. Let X and Y be identically distributed. Construct an example to show that X and Y 


need not be equal, that is, P{X = Y} need not equal 1. 


. Prove Lemma 1. 
. Let X),X2,...,X, be RVs with joint PDF f, and let f; be the marginal PDF of X;(j = 


1,2,...,n). Show that X,,X2,...,X, are independent if and only if 


f (%1,%2,---;Xn) = [Lie for all (%1,x2,..-%n) € Rn. 


Suppose two buses, A and B, operate on a route. A person arrives at a certain bus 
stop on this route at time 0. Let X and Y be the arrival times of buses A and B, 
respectively, at this bus stop. Suppose X and Y are independent and have density 
functions given, respectively, by 


fi(x) =-, O<x<a, and 0 elsewhere, 
fhiy)=-, O<y<b, and 0 otherwise. 


What is the probability that bus A will arrive before bus B? 


Consider two batteries, one of Brand A and the other of Brand B. Brand A batteries 
have a length of life with density function 


f(x) = 3Ax exp(—Ax*), x > 0, and 0 elsewhere, 
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whereas Brand B batteries have a length of life with density function given by 
g(x) = 3py” exp(—py’), y > 0, and 0 elsewhere 


Brand A and Brand B batteries operate independently and are put to a test. What 
is the probability that Brand B battery will outlast Brand A? In particular, what is 
the probability if A = 1? 
12. (a) Let (X,Y) have joint density f. Show that X and Y are independent if and only 
if for some constant k > 0 and nonnegative functions f, and fh 


f(xuy) =KiMAty) 


for all x,y ER. 
(b) Let A = {fx(x) > 0}, B= {fr(y) > O}, and fx, fy are marginal densities of X 
and Y, respectively. Show that if X and Y are independent then {f > 0} =A x B. 
13. If ¢ is the CF of X, show that the CF of X° is real and even. 
14. Let X,Y be jointly distributed with PDF f(x,y) = (1 —2°y)/4 for |x| < 1, |y| <1, 
and = 0 otherwise. Show that X “ Y and X—Y has a symmetric distribution. 


4.4 FUNCTIONS OF SEVERAL RANDOM VARIABLES 


Let X\,X2,...,X;, be RVs defined on a probability space (0,8, P). In practice we deal with 
functions of X,X2,...,X, such as X; + Xz, X; — Xo, X,X2, min(X),...,X,), and so on. Are 
these also RVs? If so, how do we compute their distribution given the joint distribution of 
X1,X2,...,Xn? 

What functions of (X),X2,...,X,) are RVs? 


Theorem 1. Let g: ®, > &,, be a Borel-measurable function, that is, if B € %,,, then 
g'(B) € By. If X = (X1,X2,...,Xn) is an n-dimensional RV (n > 1), then g(X) is an 
m-dimensional RV. 


Proof. For BE Bn 
{9(X1,X2,. me Xn) € B} = {(X1,X2,. . Xn) € g '(B)}, 


and, since g~'(B) € By, it follows that {(X1,X2,...,X,) € g~'(B)} € 8, which concludes 
the proof. 


In particular, if g: R, — R» is a continuous function, then g(X),X2,...,X,) is an RV. 

How do we compute the distribution of g(X,,X2,...,X,)? There are several ways to 
go about it. We first consider the method of distribution functions. Suppose that Y = 
g(X1,...,Xn) is real-valued, and let y € R. Then 
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P(X, =1,..-,X, =X) in the discrete case 
{(x1,..., Xn )19(X1,++-5Xn) Sy} 


Sf (%1,---,4n)dx1 ...dx, in the continuous case, 
{ (x1 ,-++ Xn) 28 (1 +n) Sy} 


where in the continuous case f is the joint PDF of (X1,...,Xn). 

In the continuous case we can obtain the PDF of Y = g(X,...,X,) by differentiating 
the DF P{Y < y} with respect to y provided that Y is also of the continuous type. In the 
discrete case it is easier to compute P{g(X1,...,X,) = y}. 

We take a few examples, 


Example 1. Consider the bivariate negative binomial distribution with PMF 


(x+y+k-—-1)! xy 


where x,y = 0,1,2,...;k > 1 is an integer; p,,p2 € (0,1); and pj +p2 < 1. Let us find 
the PMF of U = X+Y. We introduce an RV V = Y (see Remark | below) so that u = 
x+y,v = y represents a one-to-one mapping of A = {(x,y) : x,y = 0,1,2,...} onto the 
set B= {(u,v):v=0,1,2,...,u;u=0,1,2,...} with inverse mapping x = u—v,y = v. It 
follows that the joint PMF of (U,V) is given by 

(u+k—1)! u—v 


EES (1—pr—p)* fi i 
P{U=u,V=v} =< (u vivt(k—1)!?! P)(1—pi—p2)* for (u,v) € B, 
0 


otherwise. 


The marginal PMF of U is given by 


(u k 1)! qd Pi p2)* — U\ voy 
Piv=ay= (k— Dal a ape. 2 
acitag v=0 


ut+tk—1)!(1—-pi i , 


u+k—1 F 
={ 5 +e) (1—p —p2)* (u =0,1,2,...). 


Example 2. Let (X),X2) have uniform distribution on the triangle {0 < x; <x < 1), that 
is, (X,,X>) has joint density function 
2, O<xu<m<1 


0, elsewhere. 


Sf (x1,%2) = 


Let Y =X, +X). Then for y < 0, P(Y < y) =0, and for y > 2, P(Y <y) =1. For0<y<2, 
we have 
py<y)=Pm+%<)= ff fern)andn. 


O<x] Sx <1 
Xy+x2<y 
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(a) 


2p, 


> 
1 XxX] 
(b) 
2p 
1 
= . 
0 y-l y/2 1 xy 


Fig.1 (a) {x1 +22 Sy, 0 <x Sp <1, 0<y< 1} and (db) {x1 +22 < yy, O< 1 Sm <1 << 2}. 


There are two cases to consider according to whether O < y< lorl <y<2 (Fig. la 
and 1b). In the former case, 


rvso= |" 


and in the latter case, 


pir <y=1-py>y+i-[ ([ dy) dia 


xXy=y/2 xy =V—X2 


Y—X] y/2 
i 2dr. dx, = 2 | (y — 2x1 dx, = y/2 
56 0 


2=*1 


1 2 
=) 
=1-2/ (Bin Nii Ss Da, 
y/2 2 


126 MULTIPLE RANDOM VARIABLES 


Hence the density function of Y is given by 


y, Vays! 
OoyH=4 2=y,. LS ys? 
0, elsewhere. 


The method of distribution functions can also be used in the case when g takes values 
in Rn», | <m <n, but the integration becomes more involved. 


Example 3. Let X be the time that a customer takes from getting in line at a service desk 
in a bank to completion of service, and let Xz be the time she waits in line before she 
reaches the service desk. Then X, > X> and X; — X2 is the service time of the customer. 
Suppose the joint density of (X),X2) is given by 


e, O0<x <x, < co 


0, elsewhere. 
Let Y; =X, + Xo and Y2 = X; — Xo. Then the joint distribution of (Y), Y2) is given by 
P(Yi <y1, Y2<y2) = | [teumandrs 
A 


where A = {(x1,%2) 2x1) $22 < yi, X1 — 2 <2, OS KH <1 < co}. Clearly, x1 +x. > 
X1 — X2 so that the set A is as shown in Fig. 2. It follows that 


2p 


> 
0 1 xX] 


Fig. 2 {x1 +22 <yi, x1 —22 < 2,0 <<a < co}. 
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(yi —y2) /2 x2 +y2 
P(Y <y1, 2 <»)= / (/ edn) de 
x2=0 X) =X2 


yi /2 V1 x2 
+f (/ ean) dx 
x2=(y1—y2) /2 Xj=X2 


(yi-y2)/2 
—| | e (1 —e ?)dxy 
0 


y/2 
+/ (e-” —e! +x )dxy 
(91 -y2)/2 

= (1—e7)(1—e7 01-)/2) 

ae (e7% —y2)/2_ e~ 1/2) poi (er? _ e'1~¥2)/2) 


= | — 7 — 271/24. 2e- Ort y2)/2, 
Hence the joint density of Y;, Y2 is given by 
sen ity2)/2, 0<y<y) <00 
Fry. (¥1,92) = 
0, elsewhere. 


The marginal densities of Y;, Y2 are easily obtained as 


fyi Qi) =e for y; > 0, and 0 elsewhere; 
faly2) = e/2(1 —e-/?),_ for y) > 0, and 0 elsewhere. 


We next consider the method of transformations. Let (X\,...,X;,) be jointly distributed 
with continuous PDF f(x1,x2,...,%»), and let y = g(x1,%2,.--,%n) = (1,¥2,---,)n), Where 


Yi = Bi(X1,X2,.--,Xn), i=1,2,...,n 
be a mapping of ®,, to R,. Then 


P{(¥1, Yo,..<,¥_) € B} = P{(X1,X,...,Xa) € E (B)} 


= igtcee commer oy) dx;, 
= II 


where g~!(B) = {x = (x1,%2,.--,Xn) € Rn: g(x) € B}. Let us choose B to be the 
n-dimensional interval 
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Then the joint DF of Y is given by 


P{Y € By} = Gy(y) = P{ei(X) < y1,82(X) <y2,.-.,8n(K) < yn} 


-|/ vee [fers T] an, 
g'(By) i=l 


and (if Gy is absolutely continuous) the PDF of Y is given by 


__ aGyly) 
Oy Oy2 +++ OYn 


at every continuity point y of w. Under certain conditions it is possible to write w in terms 
of f by making a change of variable in the multiple integral. 


Theorem 2. Let (X),X2,...,X,) be an n-dimensional RV of the continuous type with PDF 
F (Kis Xo5egXn): 


(a) Let 


y1 = Bi Niphayeiigkn)s 


y2 = 89 (H14%D5 <0 4%n)s 


Yn = 8n(X1,X2, see yas 


be a one-to-one mapping of 8, into itself, that is, there exists the inverse transfor- 
mation 


y= hy (1, 925+ ave ins 2= ho(y1,y2, ae Yn) era 
Xn = hn(y1,Y2,-+-sYn) 


defined over the range of the transformation. 
(b) Assume that both the mapping and its inverse are continuous. 
(c) Assume that the partial derivatives 


exist and are continuous. 
(d) Assume that the Jacobian J of the inverse transformation 


Ox; Ox Ox, 
Oyr Oya 7" OYn 
Ox. OX2 Ox2 
O(x gene Xn) Oy, Oy2 eam OYn 
J =S STO = 
—O(y1, sia Yn) 
OX OXn Xn 
Oy, Oyn 7" OYn 


is different from 0 for (1, y2,.--,¥,) in the range of the transformation. 
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Then (Y, Y2,...,Y,,) has a joint absolutely continuous DF with PDF given by 


W(¥1,925--+3Yn) = Uf (A001, ---Yn),---s nt, ---5¥n)): (1) 
Proof. For (y1,Y2,---;Yn) © Rn, let 
BHA (y1¥o,<0553,) € Rat OO < ¥, Sy, 71=1,2,... jn) 
Then 
g'(B) ={xeR,: g(x) € B} = {(x1,m,...,%n): a(x) <y;, i=1,2,...,n} 
and 


Gy(y) =P{Y ¢€B} =P{X eg '(B)} 


=f oosiy [fran dn dena 


v1 ae O(x1,%2,--+5Xn) | 
— tee h estat > | dy, -:- yy. 
[. [is ily) y) | Flere are © | aaa 


Result (1) now follows on differentiation of DF Gy. 


Remark 1. In actual applications we will not know the mapping from x;,x2,...,x, to 
Y1,)2;-+-;Yn Completely, but one or more of the functions g; will be known. If only 
k,1<k <n, of the g;’s are known, we introduce arbitrarily n — k functions such that the 
conditions of the theorem are satisfied. To find the joint marginal density of these k vari- 
ables we simply integrate the w function over all the n — k variables that were arbitrarily 
introduced. 


Remark 2. An analog of Theorem 2.5.4 holds, which we state without proof. 

Let X = (X1,X2,...,X,) be an RV of the continuous type with joint PDF f, and let 
Vi = Bi(%1,X2,.--,Xn), i= 1,2,...,n, be a mapping of &,, into itself. Suppose that for each 
y the transformation g has a finite number k = k(y) of inverses. Suppose further that 8, 


can be partitioned into k disjoint sets A,,A2,...,A,, such that the transformation g from 
A;(i = 1,2,...,n) into ®, is one-to-one with inverse transformation 
Hy = hy, (V1, Ya, 060 Ynjaeees Xn = An, (V1,¥25-++s¥n)s i=1,2,...,k. 


Suppose that the first partial derivatives are continuous and that each Jacobian 


Ohy; Oh; Alas Oh; 

Oy, Oy2 OYn 

Oh dln Oly 

Oy Oy2 OYn 
Ji= 

dhyy Ohne Olay 

dy, Oy2 OYn 
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is different from 0 in the range of the transformation. Then the joint PDF of Y is given by 


k 
W(V1,Y2- . Yn) = y lJilf ivi, y2; tee Palys ake ni(¥1,Y25- a8 1Yn))- 
i=1 


Example 4. Let X,,X2,X3 be iid RVs with common exponential density function 


fa) = ‘a if x > 0, 


0 otherwise. 
Also, let 
X, +X: xX 
N=M+M+%, h=—*_, y= 
X,+X.+X3 X, +X. 
Then 


X1 = Yiy2y3,X2 = yiy2—%1 =yiy2(1—y3) and 
x3 =yi —yiy2 =yi(1—yz). 


The Jacobian of transformation is given by 
y2Y3 Y1y3 Yiy2 
J=|y2(1-y3) yiQ—ys) —yi2 |= viv. 
tye —y 0 


Note that 0 < y; < co, 0 < y2 < 1, and 0 < y3 < 1. Thus the joint PDF of Yj, Y2, Y3 is 
given by 


w(y1,92.¥3) =e 
1 ; 
= (2y2) (ste) 0<yj <~w,0<y,y3 < 1. 
It follows that Y|, Y2, and Y3 are independent. 


Example 5. Let X,,X> be independent RVs with common density given by 


f(x) = 


1 if0<x<l, 
QO otherwise. 


Let Y; = X; + X2, Y2 = X| — Xz. Then the Jacobian of the transformation is given by 


NI Nie 
NI NI 


iE 
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y2a 


-1 


Fig.3 {0<yity2 <2,0<y1—y2 <2}. 


and the joint density of Y,, Y2 (Fig. 3) is given by 


Ffy,,¥.(¥1,92) = sf (2 sn) p (2 2) 


2 2 


yit+y2 yi—y2 
2 


n0<s Ss h0< < ly 


1 
2 


The marginal PDFs of Y, and Y> are given by 


| 
yt _ 
ef 2 0s 0<y <1, 
= 2-y, 1 
fin Qi) = ae p0I=2— Yi Leyes, 
0, otherwise; 
y2+2 


1 
791 = y2 41, —l<y2 <0, 


—y2 


a) ol 
Fin (2) = m gM=l—-y, O<y <I, 


0, otherwise. 


Example 6. Let X,,X2,X3 be tid RVs with common PDF 


1 2 
f(x) =e" ?, —o<x< OH. 


if (y1,y2) € {0 <yi +y2 < 2,0 < yj —y2 < 2}. 
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Let Yj = (X; —Xp)/V2, Yo = (KX, + Xz — 2X3)//6, and ¥3 = (X; +. X2+X3)/V/3. Then 


in Oe ot 
2 6 3 
YI v2, 3 
x2 = + a7 
V2 6" 3 
V2y2 | ys 
3S SS 
v3 V3 
The Jacobian of transformation is given by 
1 1 1 
v2 v6 v3 
j -1 1 1 i 
v2 vo V3 
2S 
V3. 2 


The joint PDF of X,,X2,X3 is given by 


2 
a +.x5 + x 


1 
8(X1,%2,%3) = (Jaa) exp 5) ; X1,X2,43 ER. 
= 


It is easily checked that 


xp tx3 +33 =y, +93 +93, 


so that the joint PDF of Y;, Y2, Y3 is given by 


1 24 24 2 
wOrnvaas) = as ex Xi 3 2h, 


It follows that Y;, Y2, Y3 are also iid RVs with common PDF /. 


In Example 6 the transformation used is orthogonal and is known as Helmert’s trans- 
formation. In fact, we will show in Section 6.5 that under orthogonal transformations iid 
RVs with PDF f defined above are transformed into iid RVs with the same PDF. 

It is easily verified that 


3 2 
Xj +x. + x3 
fe yee 


j=l 


We have therefore proved that (X; + X2 + X3) is independent of paar {Xj — [(X1 + Xo + 
X3)/3]}*. This is a very important result in mathematical statistics, and we will return to 
it in Section 7.7. 
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Example 7. Let (X,Y) be a bivariate normal RV with joint PDF 


1 
f(x,y) = 2no1ox(1 — py)? 


xf I [eset Zele= mien in), Opa) 
2(1 — p?) o? 0102 Ge : 
—00 <y < 00; fy E Ry pin E R; 


and a; > 0,02 > 0,|p| < 1. 


—00 <x< 00, 


Let 
xX 
U=VXP+¥Y?,  Ur=5. 
For uw, > 0, 
Vert+y=uy and * =u 
y 


have two solutions: 


uj, U2 


uy 
= SS — 01 FS SS and x2 = —X1, 
V1l+u3 JV1l+us 


xX) 


for any uz € R. The Jacobians are given by 


Uu2 uy 
_ 1+ U5 (1 +43)3/2 uy 
Ji =). = 1 UU =e 
JV1l+u3 (1+u5)3/2 


It follows from the result in Remark 2 that the joint PDF of (Uj), U2) is given by 


uy f uyu2 uy 
I+ u5 Jit /1+% 


if uy > O,u2 ER, 


w(u1,U2) = f ( UU Uy 
J/1+ue JV1l+u3 
0) otherwise. 


In the special case where pu; = 2 = 0, p = 0, and a; = 02 = <0, we have 


~~ Ino 


so that X and Y are independent. Moreover, 


f(x,y) 


el ty?) /207) 


f(x,y) =f (-x, =); 
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and it follows that when X and Y are independent 


1 2uy ui /20? 


uy >0, -w<m<o, 


w(u1,U2) = 2702 l+u5 
0, otherwise. 
Since 
1 uy _,2 2 
w(t) = ene 


m(1+u3) o? 


it follows that U; and U2 are independent with marginal PDFs given by 


uy _2 2 

—e u, /20 , uy >O, 
wi (uy) = (oR 

0, uy <0 


and 


respectively. 
An important application of the result in Remark 2 will appear in Theorem 4.7.2. 
Theorem 3. Let (X,Y) be an RV of the continuous type with PDF f. Let 
Z=X+Y,0=X-Y,V=XY; 
and let W = X/Y. Then the PDFs of Z, U, V, and W are, respectively, given by 
fale) = ff flss2—x)ak, 


fulu) = I. f(uty,y)dy, 


fw(w) = a. f (xw,x)|x| dx. 


Proof. The proof is left as an exercise. 


(2) 


(3) 


(4) 


(5) 
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Corollary. If X and Y are independent with PDFs f, and fs, respectively, then 


fele) = | filsyale—2)at 6) 
folu) = [fiw sho)ay, ) 
solv) = f ch wh (2) Gas (8) 
fw) = [flew ypbobae (9) 


Remark 3. Let F and G be two absolutely continuous DFs; then 
A) =f Fe-y6'oyay= [ Ge-»F Oey 
is also an absolutely continuous DF with PDF 
wa) =f Fe—ye'oyar= | oe-vr' oye. 


If 


F(x) = So pxe(x—mx) and G(x) =} gje(x-yj) 
k j 


are two DFs, then 
H(x) = °° pegie(x— 1% — yy) 
kj 


is also a DF of an RV of the discrete type. The DF H is called the convolution of F and G, 
and we write H = F « G. Clearly, the operation is commutative and associative; that is, if 
F\, F 2, F3 are DFs, Fy * Fy = Fy * Fy and (F) * F2) * F; = F) * (F) *F3). In this terminology, 
if X and Y are independent RVs with DFs F and G, respectively, X + Y has the convolution 
DF H = F'«G. Extension to an arbitrary number of independent RVs is obvious. 


Finally, we consider a technique based on MGF or CF which can be used in certain 
situations to determine the distribution of a function g(X,,X,...,X,) of X;,X2,...,Xn- 

Let (X1,X2,...,X;,) be an n-variate RV, and g be a Borel-measurable function from 8, 
to Ry . 
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Definition 1. If (X,,X2,...,X,) is discrete type and =. ad 418 (X1,X2,---,%Xn)|P{X, = 
X1,X2 =X2,...,Xn = Xn} < oo, then the series 


Eg(X1,X2,...,Xn) = » 8(X1,X2,---,Xn)P{X = x1,X2 = X0,...,Xn = Xn} 


is called the expected value of g(X\,X2,...,Xn). If (X1,X2,...,X,) is a continuous type 
RV with joint PDF f, and if 


co oo oo 
/ / | Pi Gste ores. ger cree 5% [a <oe 
—oo J—oo —oo 


then 


[oe) co co 
Be(Xi Xo) = f / ff B(X1,X2,---.Xn)f(%1,X2,--- Xn ITs 
—oco —co —oco 


is called the expected value of 9(X),X2,...,Xn). 


Let Y = g(X,X2,...,X;,), and let h(y) be its PDF. If E|¥| < oo then 


EY= [- yh(y)dy. 


—co 


An analog of Theorem 3.2.1 holds. That is, 


[. yh(y )dy = i. [. of g(X1, XQ,+-- sralt X15X2,-+- 5%, Tas 


in the sense that if either integral exists so does the other and the two are equal. The result 
also holds in the discrete case. : 
Some special functions of interest are ae ay , where ky,k2,...,k, are non- 


TX} 


negative integers, erin, where ?),f2,...,f, are real numbers, and ei i=! , where 


i= Vl. 


Definition 2. Let X1,X2,...,Xn be jointly distributed. If Ee~!='"" exists for || < hj, 
jJ=1,2,...,n, for some hj > 0, j = 1,2,...,n, we write 


M(t Se tn) -—E (ee (10) 


and call it the MGF of the joint distribution of (X|,X2,...,X,) or, simply, the MGF of 
(X,,X2,...,Xn). 


Definition 3. Let t,,f2,...,f, be real numbers and i = /—1l. Then the CF of 
(X1,X2,...,X,) is defined by 
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b(t1,2,...,tn) =E¢ exp iS > 4X) 


j=l 
n n 
=E¥ cos{ S 4X) |p +iE4 sin | So 4X; (11) 
j=l = 
As in the univariate case $(t),f,...,t,) always exists. 


We will mostly deal with MGF even though the condition that it exist for |t;| < A;, 
j=1,2,...,n restricts its application considerably. The multivariate MGF (CF) has prop- 
erties similar to the univariate MGF discussed earlier. We state some of these without 
proof. For notational convenience we restrict ourselves to the bivariate case. 


Theorem 4. The MGF M(t), 2) uniquely determines the joint distribution of (X,Y), and 
conversely, if the MGF exists it is unique. 


Corollary. The MGF M(t, ,t2) completely determines the marginal distributions of X and 
Y. Indeed, 


M(t1,0) = E(e"*) = Mx(t), (12) 
M(0,t2) = E(e?”) = My(t). (13) 


Theorem 5. If M(t,f2) exists, the moments of all orders of (X,Y) exist and may be 
obtained from 


m-+n 
0 M(t t2) = Ex" 7"), (14) 
Ott, Ip =n=0 
Thus, 
0M(0,0) _ py 0M(0.0) _ py 
Ot; Otn 
2 2 
O°M(0,0) _py2 — F°M(0,0) __ yo 
Ott Ot 
0?M (0,0) 
= E(XY 
Ot Oto ( ) 
and so on. 


A formal definition of moments in the multivariate case will be given in Section 4.5. 
Theorem 6. X and Y are independent RVs if and only if 
M(t, t2) =M(t,0) M(0,t) for all t),t2 ER. (15) 


Proof. Let X and Y be independent. Then, 
M(t,t2) = E{e"**®"} = E(e"*)E(e®") = M(t1,0)M(0,n2). 
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Conversely, if 
M(t ) ty) = M(t,,0)M(0, t2), 


then, in the continuous case, 


[[eereravay=| [erncras| | fernorar], 


[ferneesacay = [fer nce) acay. 


By the uniqueness of the MGF (Theorem 4) we must have 


that is, 


f(xy)=A@MAY) — forall (x,y) € Ro. 


It follows that X and Y are independent. A similar proof is given in the case where (X, Y) 
is of the discrete type. 


The MGF technique uses the uniqueness property of Theorem 4. In order to find the 
distribution (DF, PDF, or PMF) of Y = g(X1,X2,...,X,) we compute the MGF of Y using 
definition. If this MGF is one of the known kind then Y must have this kind of distribution. 
Although the technique applies to the case when Y is an m-dimensional RV, | <k <n, we 
will mostly use it for the m= | case. 


Example 8. Let us first consider a simple case when X is normal PDF 


=e. 55 < xe So, 


Xx) => 
f(x) = 
Let Y = X?. Then 


y>0. 


Example 9. Suppose X, and X are independent with common PDF f of Example 8. Let 
Y,; = X; — X2. There are three equivalent ways to use MGF technique here. Let Y2 = Xo. 
Then rather than compute 


M(s1 ,52) = Ee t92%2 
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it is simpler to recognize that Y; is univariate so 


My,(s) = Ee®—*) 
= (Ee™! )(Ee~*) 


= eo [29s /2 a ee. 
It follows that Y; has PDF 


1 2 
f(x) =——e*", -co<x<0o. 


Va 


Note that My, (s) = M(s,0). 
Let Y3 = X; + Xo. Let us find the joint distribution of Y;and Y3. Indeed 


E (cites) =E (ees ene 


—E (clita) E (enh) 


= elsits)’/2, o(si-s2)"/2 — pt. o% 


and it follows that Y, and Y3 are independent RVs with common PDF f defined above. 
The following result has many applications as we will see. Example 9 is a special case. 


Theorem 7. Let X,,X2,...,X, be independent RVs with respective MGFs M,(s), 
i=1,2,...,. Then the MGF of Y = $>y_, a;X; for real numbers a), a2,...,dn is given by 


n 
My(s) = | [ Mi(ais). 
i=1 
Proof. If M; exists for |s| < h;, h; > 0, then My exists for |s| < min(M,...,/,) and 


n n 
My(s) = Ee’ i= % = | | Be™ = | [Mi (ais). 


i=l i=1 


Corollary. If X;’s are iid, then the MGF of Y = 577 X; is given by My(s) = [M(s)]". 


Remark 4. The converse of Theorem 7 does not hold. We leave the reader to construct an 
example illustrating this fact. 


Example 10. Let X,,X2,...,Xm be tid RVs with common PMF 


P{IX=k}= (j)ha—py k=0,1,2,...,.2;0<p<l. 
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Then the MGF of X; is given by 
M(t) = (1—p+pe')”. 


It follows that the MGF of S,, = X; +X2+---+Xy, 1s 


Ms, (t) =] [(—p+pe'y’” 
1 
=(l—p+pe')™, 


and we see that S,,, has the PMF 


mn 
AY 


P{Sn = s} = ( rue, s=0,1,2,...,mn. 


From these examples it is clear that to use this technique effectively one must be able 
to recognize the MGF of the function under consideration. In Chapter 5 we will study a 
number of commonly occurring probability distributions and derive their MGFs (whenever 
they exist). We will have occasion to use Theorem 7 quite frequently. 

For integer-valued RVs one can sometimes use PGFs to compute the distribution of 
certain functions of a multiple RV. 

We emphasize the fact that a CF always exists and analogs of Theorems 4—7 can be 
stated in terms of CFs. 


PROBLEMS 4.4 


1. Let F be a DF and € be a positive real number. Show that 


x+e 
wie) == | F(x) dx 


and 


are also distribution functions. 
2. Let X,Y be iid RVs with common PDF 


e* ifx>0, 
Fx) = f ifx <0. 


(a) Find the PDF of RVs X+ Y, X—Y, XY, X/Y, min{X,Y}, max{Xx,Y}, 
min{X, Y}/max{X,Y}, and X/(X+Y). 
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(b) Let U=X+Y and V = X — Y. Find the conditional PDF of V, given U = u, 
for some fixed u > 0. 
(c) Show that U and Z = X/(X + Y) are independent. 

3. Let X and Y be independent RVs defined on the space (0,8, P). Let X be uniformly 
distributed on (—a,a),a > 0, and Y be an RV of the continuous type with density f, 
where f is continuous and positive on R. Let F be the DF of Y. If uo € (—a,a) isa 
fixed number, show that 


f(y) 


fyixey(y | uo) = F(ug +a) —F(uo —a) 
0 otherwise, 


ifuy—-a<y<upta, 


where fy|x+y(y | uo) is the conditional density function of Y, given X + Y = uo. 
4, Let X and Y be iid RVs with common PDF 


rey={1 if0<x<1, 


0 otherwise. 


Find the PDFs of RVs XY, X/Y, min{X, Y}, max{X, Y}, min{X, Y}/max{X, Y}. 
5. Let X,,X2,X3 be iid RVs with common density function 


rey={t if0<x<1; 


0 otherwise. 


Show that the PDF of U = X, + X2 +X; is given by 


a O0<u<l, 
3 
rite 3u—w— 5, l<u<2, 
(u—3) 
7 2<u<3, 
0, elsewhere. 


An extension to the n-variate case holds. 
6. Let X and Y be independent RVs with common geometric PMF 


P{X =k} =n(1-a)*, k=0,1,2,...;0<a<1. 


Also, let M = max{X,Y}. Find the joint distribution of M and X, the marginal 
distribution of M, and the conditional distribution of X, given M. 

7. Let X be a nonnegative RV of the continuous type. The integral part, Y, of X is dis- 
tributed with PMF P{Y =k} = Me~>/k!, k=0,1,2,..., > 0; and the fractional 
part, Z, of X has PDF f(z) = 1 if 0 < z< 1, and = 0 otherwise. Find the PDF of 
X, assuming that Y and Z are independent. 
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10. 


11. 


12. 


13. 


14. 


15. 
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. Let X and Y be independent RVs. If at least one of X and Y is of the continuous 


type, show that X + Y is also continuous. What if X and Y are not independent? 


. Let X and Y be independent integral RVs. Show that 


P(t) = Px(t)Py(t), 


where P, Px, and Py, respectively, are the PGFs of X + Y, X, and Y. 


Let X and Y be independent nonnegative RVs of the continuous type with PDFs f 
and g, respectively. Let f(x) = e~* if x > 0, and = 0 if x < 0, and let g be arbitrary. 
Show that the MGF M(t) of Y, which is assumed to exist, has the property that the 
DF of X/Y is 1—M(-t). 

Let X, Y,Z have the joint PDF 


6(l+x+y+z)"* if0<x,0<y,0<z, 
0 otherwise. 


f(%,y,2) -{ 


Find the PDF of U=X+Y+Z. 
Let X and Y be iid RVs with common PDF 


feel? Da) te C/eey x0, 
0, x<0. 


Find the PDF of Z = xY. 


Let X and Y be iid RVs with common PDF f defined in Example 8. Find the joint 
PDF of U and V in the following cases: 

(a) U=VX?24+Y?, V=tan—!(X/Y), —(1/2) < V < (1/2). 

(b) U=(X+Y)/2, V=(X—Y)?*/2. 

Construct an example to show that even when the MGF of X + Y can be written as 
a product of the MGF of X and the MGF of Y,X and Y need not be independent. 
Let X|,X2,...,X, be iid with common PDF 


f(x) = a<x<b, =0_ otherwise. 


Using the distribution function technique show that 
(a) The joint PDF of X(,) = max(X1,X2,...,X,), and X(1) = min(X,X2,...,Xn) 


is given by 
=] __yy\n—2 
u(x,y) = nie a a<y<x<b, 
and = 0 otherwise. 
(b) The PDF of X(,) is given by 
g(z) = ——. a<z<b, =0O_ otherwise 
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and that of X(,) by 


n(b—z)"! 


FAA <z<b, =0 th ise. 
(b—ay" a<z<b, otherwise 


A(z) = 
16. Let X,,X> be iid with common Poisson PMF 
M* 
P(X, =x) = ae x=0,1,2,...,2=1,2, 
x! 


where \ > 0 is a constant. Let X(2) = max(X1,X2) and X(;) = min(X1,X2). Find 
the PMF of X(2). 
17. Let X have the binomial PMF 


P(X=k)= ( : Jehan k=0,1,...,2,; 0<p<1. 


Let Y be independent of X and Y “YX. Find PMF of U=X+Y and W=X~-Y. 


4.55 COVARIANCE, CORRELATION AND MOMENTS 


Let X and Y be jointly distributed on (0,5, P). In Section 4.4 we defined Eg(X,Y) for 
Borel functions g on R3. Functions of the form g(x, y) =.»/y* where j and k are nonnegative 
integers are of interest in probability and statistics. 


Definition 1. If E|X/Y*| < co for nonnegative integers j and k, we call E(X/Y*) a moment 
of order (j +k) of (X,Y) and write 


mix = E(X'Y*). (1) 
Clearly, 
Mo = EX, Mo, = EY (2) 
my = EX*, my, =EXY, moo = EY’. 


Definition 2. If E | (X — EX) (Y—EY | < oo for nonnegative integers j and k, we call 
E {(X — EX)/(Y — EY)*} a central moment of order (j +k) and write 


jk = E{ (X — EX) (Y—EY)*}. (3) 


Clearly, 


(4) 


Lio = Hor =9, 20 = var(X), for = var(Y), 
Hi = E{(X—myo)(Y —mo1)} . 


We see easily that 


[11 = E(XY) — EX EY. (5) 
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Note that if X and Y increase (or decrease) together then (X — EX)(Y — EY) should be pos- 
itive, whereas if X decreases while Y increases (and conversely) then the product should be 
negative. Hence the average value of (X — EX)(Y — EY), namely ,11;, provides a measure 
of association or joint variation between X and Y. 


Definition 3. If E{(X — EX)(Y — EY)} exists, we call it the covariance between X and Y 
and write 
cov(X,Y) = E{(X — EX)(Y — EY)} = E(XY) — EXEY. (6) 
Recall (Theorem 3.2.8) that E{ Y — a}* is minimized when we choose a = EY so that 
EY may be interpreted as the best constant predictor of Y. If instead, we choose to predict 
Y by a linear function of X, say aX +b, and measure the error in this prediction by E{Y — 
ax — bY, then we should choose a and b to minimize this so-called mean square error. 
Clearly, E(Y — aX — b)? is minimized, for any a, by choosing b = E(Y — aX) = EY —aEX. 
With this choice of b, we find a such that 


E(¥Y —aX —b)’ = E{(Y¥Y —EY)—a(X —EX)}* 


2 ree) 
= oy — 2apy, +a ox 
is minimum. An easy computation shows that the minimum occurs if we choose 
a=, (7) 


provided ay > 0. Moreover, 


min E(Y — aX — b)? = min {oy — 2apiyy +a’ox} 
a,b a 


a OF — fay 
Ox 
2 
age i3|oe_| \. (8) 
(cxoy) 
Let us write 
p=. (9) 
OxOy 


Then (8) shows that predicting Y by a linear function of X reduces the prediction error 
from 07 to 07(1—p”). We may therefore think of p as a measure of the linear dependence 
between RVs X and Y. 


Definition 4. If EX?, EY? exist, we define the correlation coefficient between X and Y as 


_ cov(X,Y) EXY — EXEY 4 
P™ SD(X)SD(Y)  \/EX? — (EX), /EY? — (EY) 


where SD(X) denotes the standard deviation of RV X. 
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We note that for any two real numbers a and b 


2. 22 
a+b 
b| < 

|ab| < Q=* 


so that E|XY| < 00 if EX” < oo and EY? < oo. 


Definition 5. We say that RVs X and Y are uncorrelated if p = 0, or equivalently, 
cov(X, Y) =0. 


If X and Y are independent, then from (5) cov(X, Y) = 0, and X and Y are uncorrelated. 
If, however, p = 0 then X and Y may not necessarily be independent. 


Example 1. Let U and V be two RVs with common mean and common variance. Let 
X=U+Vand Y=U—V. Then 


cov(X, Y) = E(U? — V?) —-E(U+ V)E(U—V) =0 
so that X and Y are uncorrelated but not necessarily independent. See Example 4.4.9. 


Let us now study some properties of the correlation coefficient. From the definition we 
see that p (and also cov(X, Y)) is symmetric in X and Y. 


Theorem 1. 
(a) The correlation coefficient p between two RVs X and Y satisfies 
lal <1. (11) 


(b) The equality |p| = 1 holds if and only if there exist constants a 4 0 and b such that 
P{aX+b=1}=1. 


Proof. From (8) since E(Y — aX — b)* > 0, we must have 1 — p” > 0, or equivalently, (11) 
holds. 

Equality in (11) holds if and only if * = 1, or equivalently, E(Y — aX — b)” = 0 holds. 
This implies and is implied by P(Y = aX +b) = 1. Herea £0. 


Remark I. From (7) and (9) we note that the signs of a and p are the same so if p = | then 
P(Y =aX +b) where a > 0, and if p = —1 thena <0. 


Theorem 2. Let EX? < 00, EY” < co, and let U = aX +b, V=cY +d. Then, 


Px,y = =puyy; 


where px y and py,vy, respectively, are the correlation coefficients between X and Y and U 
and V. 


Proof. The proof is simple and is left as an exercise. 
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Example 2. Let X,Y be identically distributed with common PMF 


1 
PIX=kh= 5, k= 1,2,...,N(W>1). 


Then 
N+1 1)2N+1 
an ge gx? = gy? = NtVON+)) 
2 6 
so that 
N?-1 
Xx —4 Y — 
var(X) = var(Y) iG 
Also, 
1 2 2 2 
E(XY) = 5{EX + EY* —E(X—Y)*} 
_ (N+1)QN+4+1)  E(X-Y/) 
= 5 5 : 
Thus, 
+1)(2N+1) E(X-Y)? (N+1)? 
cov(X.y) =< NEDON+1) _ BOK=1)? _ WHI) 
6 2 4 
_(N+1)(W-1) 1 ; 
= D zE(X Y) 
and 
(N? —1)/12—E(X—Y)?/2 
PX,Y = 7 
(N? —1)/12 
6E(X—Y/ 
= | -—_.___.. 
N?—-1 


If P{X = Y} = 1, then p = 1, and conversely. If P{Y =N+1—X}=1, then 


N+1)(2N +1) gee 


wd 
= 6 2 


+(N+1)°, 


and it follows that pyy = —1. Conversely, if py,y = —1, from Remark 1 it follows that 
Y = —aX + b with probability 1 for some a > 0 and some real number b. To find a and b, 
we note that EY = —aEX + b so that b = [(N + 1)/2](1 +a). Also EY? = E(b— aX)’, 
which yields 


(1 — a’) EX? + 2abEX — b* =0. 


Substituting for b in terms of a and the values of EX? and EX, we see that a? = 1, so that 
a= 1. Hence, b= N +1, and it follows that Y= N+ 1—X with probability 1. 
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Example 3. Let (X,Y) be jointly distributed with density function 


x+y, O0<x<10<y<l. 
0, otherwise. 


f(x,y) = 


Then 


EX'y” = [ [xs 1 mm (x+y) dxdy 
0 
-{ [: ai rae iE xly™*! dx dy 
1 


where / and m are positive integers. Thus 


q 
Maro. 
12 
EX? = py? = > 
12 
5 49 1 
Voge = Se 
vanlX) etl) "9 Gag 1a! 
1 49 1 
xX y= = Sivtk: 
cov(X, Y) 37 [44 14a’? / 


Theorem 3. Let X,,X2,...,X, be RVs such that E|X;| < oo,i = 1,2,..., 


a\,42,..-,@, be real numbers, and write 
S= aX, + a2X2 a +aXp- 


Then ES exists, and we have 


n 
ES = aEX;. 
j=l 


Proof. If (X1,X2,...,Xn) is of the discrete type, then 


ES= SS (ayxi, + 2X), +++ + anxi,)P{X1 = Xj, ,X2 = Xin" ** Xn 


iy 12 ,+++ hn 


ie Py P{X, =Xji,,.--,Xn =i, } 


petey 


+: oe ye P{X, =Xi,,---,Xn =i, } 


in U1 y---sin-1 


a +4 P{Xn = x;,} 


in 


= ie ee + an,EXy. 


= Xi, } 
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. Let 


(12) 
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The existence of ES follows easily by replacing each a; by |a;| and each x; by |x;| and 
remembering that E|X;| < oo,j = 1,2,...,n. The case of continuous type (X1,X2,...,Xn) 
is similarly treated. 


Corollary. Take a) = a) =--- =a, = 1/n. Then 
Xt+Xt--4+X,\ 1S 
E = EX; 
(See) =I, 
i=l 
and if EX; = EX, =--- = EX, = py, then 


2(= =) = 
nN 


Theorem 4. Let X),X»,...,X;, be independent RVs such that E|X;| < 00,i = 1,2,...,n. 
Then E(][j_, X;) exists and 


(IL) TEx. (13) 
i=1 i=1 


Let X and Y be independent, and gj(-) and g>(-) be Borel-measurable functions. Then 
we know (Theorem 4.3.3) that g;(X) and go(Y) are independent. If E{g:(X)}, E{g2(Y)}, 
and E{g;(X) go(Y)} exist, it follows from Theorem 4 that 


E{gi(X) 82(¥)} = E{gi(X)} Et go(¥)}. (14) 


Conversely, if for any Borel sets A; and Az we take g,(X) = 1 if X € Aj, and = 0 otherwise, 
and go(Y) = 1 if Y € Ao, and = 0 otherwise, then 


E{gi(X) g2(¥)} =P{X €A1,¥ € Ao} 


and E{g,(X)} = P{X € Aj}, E{go(Y)} = P{Y € A>}. Relation (14) implies that for any 
Borel sets A, and A of real numbers 


P{X € Ay, Y € Ap} = P{X € As} PLY € Ap}. 


It follows that X and Y are independent if (14) holds. We have thus proved the following 
theorem. 


Theorem 5. Two RVs X and Y are independent if and only if for every pair of Borel- 
measurable functions g; and go the relation 


E{gi(X) g2(Y)} = E{gi(X)} E{ga(¥)} (15) 


holds, provided that the expectations on both sides of (15) exist. 
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Theorem 6. Let X;,X2,...,X, be RVs with E|X;|* < oo for i = 1,2,...,n. Let 


a\,42,..-,@» be real numbers and write § = yi a;,X;. Then the variance of S exists and 
is given by 
n n n 
var(S) = S / a; var(Xi) + S—S°ajajcov(X),Xj). (16) 
i=1 i=1 j=l 
iAj 


If, in particular, X,,Xo,...,X, are such that cov(X;,X;) = 0 for i,j = 1,2,...,n,iA/j, then 
var(S) = S a; var(Xi). (17) 


Proof. We have 


= e a; (X;— EX;)” + 5 ajaj(X; — EX;)(X; — EX)) 


i=l ij 


= Sl aFE(X; — EX)” + 9) ayajE{ (Xi — EX;) (Xj — EX;)}. 
i=l i¢j 


If the X;’s satisfy 
cov(X;,X;) =0 for i,j =1,2,...,n;i 4 J, 
the second term on the right side of (16) vanishes, and we have (17). 


Corollary 1. Let X;,X,...,X, be exchangeable RVs with var(X;) = 07, i= 1,2,...,n. 
Then 


n n n 
2 
var ( y ot) =~ y a; + po? y adj, 
i=1 i=1 iAj 


where p is the correlation coefficient between X; and X;, i ~ j. In particular, 


Corollary 2. If X,,X2,...,X;, are exchangeable and uncorrelated then 


n n 
var ) AG =o > i, 
i=1 i=1 
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and 
n 
X; a 
var (>: x) = oe 


Theorem 7. Let X,,X2,...,X, be tid RVs with common variance a”. Also, let 
a\,a2,...,ay be real numbers such that }~} a; = 1, and let S = }~7_, a:X;. Then the variance 
of S is least if we choose a; = 1/n,i=1,2,...,n 


Proof. We have 


which is least if and only if we choose the a;’s so that >", a 2 is smallest, subject to the 
condition )>/_, a; = 1. We have 


which is minimized for the choice a; = 1/n,i=1,2,...,n 
Note that the result holds if we replace independence by the condition that X;’s are 
exchangeable and uncorrelated. 


Example 4. Suppose that r balls are drawn one at a time without replacement from a bag 
containing n white and m black balls. Let S,. be the number of black balls drawn. 


Let us define RVs X; as follows: 


X,=1 if the kth ball drawn is black 


—— ene 2 
=0 if the kth ball drawn is white 
Then 
S, =X, +Xo+---+X,. 
Also 
Py =) s— —. PRE=0)5 (18) 
= m+n’ ac = aan 
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Thus EX; = m/(m-+n) and 


(X) m m mn 
var(X;,) = = 
mtn (mtn? (m+n) 


To compute cov(X;,X;),j #k, note that the RV XX; = 1 if the jth and the kth balls drawn 
are black, and = 0 otherwise. Thus 


m m—1 


E(X;X, P{X;=1,X,=1 —_——— 19 
(XjXx) {Xj hk i minmen—1 (19) 
and 
cov(X;, Xx) = m : 
(m+n)?(m+n—1) 
Thus 
mr 
ES, = EX; = 
3 a m+n 
and 
mn mn 
r(S,.) = 1 
ee "(m+n st a oa) 
mnr os ) 
= m+n—r). 
(m+n)?(m+n+1) 
The reader is asked to satisfy himself that (18) and (19) hold. 
Example 5. Let X;,X2,...,X, be independent, and a), a2,...,d, be real numbers such that 


>> a; = 1. Assume that E|X?| < 00,i=1,2,...,n, and let var(X;) = 0?,i=1,2,...,n. 
Write S = S~"_, a;X;. Then var(S) = )~7_, a?0? =o, say. To find weights a; such that o 
is Minimum, we write 


o = alot +ajo8 +---+(1—a, —a)— ++» —ay_1)* 0 
and differentiate partially with respect to a,a2,...,d,—1, respectively. We get 

Oo 2 9 

oa 2ayoy — 2(1— ay — az — +++ — ay_1)07, = 0, 

1 
Oo 2 2 
on 2dy 107, —2(1 — ay — ay — +++ —ay_1)07, = 0. 
n—1 


It follows that 
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that is, the weights a;, j = 1,2,...,n, should be chosen proportional to 1 / o;. The minimum 
value of o is then 


n 2 n 
yk py 
Onin = ae —= Ae 

iz1 -/ i=1 “i 


where k is given by }7"_(k/o7) = 1. Thus 


1 H 


Omin = 7 ’ 
dj-1 (1/ a) n 


where H is the harmonic mean of the o?. 


We conclude this section with some important moment inequalities. We begin with the 
simple inequality 


jJa+b)" < c,(lal"+|bI"), (20) 


where c, = 1 forO<r< 1and=2’~! forr> 1. For r=0 and r= 1, (20) is trivially true. 
First note that it is sufficient to prove (20) when 0 < a < b. Let 0 <a< b, and write 
x =a/b. Then 


(a+by" _ +x)’ 


a+b 1+x" ° 


Writing f(x) = (1+x)"/(1 +2"), we see that 


2 (1 an), 


where 0 < x < 1. It follows that f’(x) > Oifr > 1,=Oifr=1, and < Oifr< 1. Thus 


dnax f(x) =f(0)=1 ifr <1, 


while 


— — 9-1 i ee 
max f) f(1) =2 ifr>1 


Note that ja+ |" < 2”(|a|" + ||") is trivially true since 
|a+b| < max(2|a],2|b]). 
An immediate application of (20) is the following result. 


Theorem 8. Let X and Y be RVs and r > 0 be a fixed number. If E|X 
finite, so also is E|X+Y|". 


", E|Y|" are both 
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Proof. Leta=X and b= ¥Y in (20). Taking the expectation on both sides, we see that 
E|X+Y|" <c,(E|X|" + E]¥|"), 


where c, = 1ifO<r<land=2’~'ifr>1. 
Next we establish HGlder’s inequality, 


P q 
ly] < BE Die 21) 
Po 4 


where p and q are positive real numbers such that p > 1 and 1/p+1/g = 1. Note that for 
x > 0 the function w = logx is concave. It follows that for x;, x2 > 0 


log[#x; + (1 —t)x2] > tlogx, + (1 — 1) logx.. 
Taking antilogarithms, we get 


Hx ' >t +(1—-Dm. 


Now we choose x; = |x|?,x2 = |y|?,t = 1/p,1—t = 1/g, where p > 1 and 1/p+1/q=1, 
to get (21). 


Theorem 9. Let p > 1,q > 1 so that 1/p+1/q = 1. Then 
E|XY| < (E|X|?)'/?(E|Y|4)'/4. (22) 
Proof. By Holder’s inequality, letting x = X{E|X|?}—!/?,y = Y{E|Y|2}—!/4, we get 


IXY| < po "|X/P{E|X|P}/?—! {B|y|7y 1/4 
+g YEE EXPY. 


Taking the expectation on both sides leads to (22). 
Corollary. Taking p = q = 2, we obtain the Cauchy—Schwarz inequality, 
Ex |< 2" ere lye, 
The final result of this section is an inequality due to Minkowski. 
Theorem 10. For p > 1, 
{E|X+ YP}? < {E|XP}/? + {E|YP SY”. (23) 
Proof. We have, for p > 1, 


IX+Y/P < |X||xX+¥P'+|y||xt+ypPct. 
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Taking expectations and using Hélder’s inequality with Y replaced by |X +Y|?~!(p > 1), 
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we have 


Excluding the trivial case in which E|X + Y|? = 0, and noting that (p — 1)q = p, we have, 


E|K+YP < {BIXP}V (EX + YO -DyV/9 
4 {E|¥|P}!/PLE|X + yie-Yate 
= [{EIXP}/? + {BLY} {e1x + ¥|P- D2} 


after dividing both sides of the last inequality by {E|X + Y|?}!/9, 


{E|IX+ YP}? < {EXP} P+ {EYP}, p>. 


The case p = | being trivial, this establishes (23). 


PROBLEMS 4.5 


1. 


Suppose that the RV (X,Y) is uniformly distributed over the region R = {(x,y) : 


0<x<y< 1}. Find the covariance between X and Y. 


. Let (X, Y) have the joint PDF given by 


ete if0<x<1,0<y<2, 
f(x,y) = 3 
0 otherwise. 


Find all moments of order 2. 


. Let (X,Y) be distributed with joint density 


— 


fay) J gto! -»)) fll <LbI<1. 


Of 


otherwise. 


Find the MGF of (X,Y). Are X, Y independent? If not, find the covariance between 


X and Y. 


. For a positive RV X with finite first moment show that (1) E\VX < VEX and 


(2) E{1/X} > 1/EX. 


. If X is anondegenerate RV with finite expectation and such that X > a > 0, then 


E{\/X? —a?} < \/(EX)?— a2. 


(Kruskal [56]) 
. Show that for x > 0 


oo 2 lore) lore) 
(/ te? a) <| eta | Pet!" di. 
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10. 
11. 


12. 


13. 


and hence that 


NN] 


oo ‘ l : 
/ er gS (Ay ale. 


. Given a PDF f that is nondecreasing in the interval a < x < b, show that for any 


s>0 


b pst _ qt b 
| Xx (x) dx = aoeca | f (x) dx, 


with the inequality reversed if f is nonincreasing. 


. Derive the Lyapunov inequality (Theorem 3.4.3) 


{E|X|"}/" <{EIXIHA, 1 <r<s<ox, 


from Hélder’s inequality (22). 


. Let X be an RV with E|X|" < 00 for r > 0. Show that the function log E|X|" is a 


convex function of r. 

Show with the help of an example that Theorem 9 is not true for p < 1. 

Show that the converse of Theorem 8 also holds for independent RVs, that is, if 
E|X + Y|" < oo for some r > 0 and X and Y are independent, then E|X|" < co, 
E\Y|"<oo. 

[Hint: Without loss of generality assume that the median of both X and Y is 0. 
Show that, for any t > 0, P{|X+Y| >t} > (1/2)P{|X| > t}. Now use the remarks 
preceding Lemma 3.2.2 to conclude that E|X|" < co.] 

Let (0,5,P) be a probability space, and A;,A2,...,A, be events in S such that 
P(Uf_| Ak) > 0. Show that 


( x1 PAK)” — tai PAK 
2 PAA js = =e 
Fe PAA ee UA 


1<j<k<n 


(Chung and Erdés [14]) 
[Hint: Let X; be the indicator function of Ay, k = 1,2,...,2. Use the Cauchy— 
Schwarz inequality. ] 
Let (Q,8,P) be a probability space, and A,B,€ § withO < PA<1,0<PB<l. 
Define p(A, B) by p(A, B) = correlation coefficient between RVs 4, and Ig, where 
I,, Ip, are the indicator functions of A and B, respectively. Express (A,B) in terms 
of PA, PB, and P(AB) and conclude that p(A,B) = 0 if and only if A and B are 
independent. What happens if A = B or if A = B°? 


(a) Show that 


p(A,B) > 0 P{A | B} > P(A) = P{B| A} > P(B) 
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and 
p(A,B) <0 P{A |B} <PA<=P{B|A} < PB. 
(b) Show that 


P(AB) P(A‘B°) — P(AB®) P(A‘B) 
(PA PAS - PB PB°)!/? 


p(A,B) = 


14, Let X,,X>,...,X,, be iid RVs and define 


x- Sint Xt = din (Xi -X)? 
n (n—1) 


Suppose that the common distribution is symmetric. Assuming the existence of 
moments of appropriate order, show that cov(X, 5”) = 0. 


15. Let X,Y be iid RVs with common standard normal density 


1 : 
f(x) = ee —00 <x <0. 


Let U =X +Y and V = X?+ Y’. Find the MGF of the random variable (U, V). 
Also, find the correlation coefficient between U and V. Are U and V independent? 


16. Let X and Y be two discrete RVs: 
P{X=x}=pi, P{X=m}=1—-pi; 
and 
P{Y¥=yi}=pm, P{Y=y}=1-pr. 


Show that X and Y are independent if and only if the correlation coefficient between 
X and Y is 0. 

17. Let X and Y be dependent RVs with common means 0, variances 1, and correlation 
coefficient p. Show that 


E{max(X’,¥?)} <14+ V1—p?. 


18. Let X;,X>2 be independent normal RVs with density functions 


2 
1 1 (x—p; : 
i(x) = ex ; -w <x< 00) i= 1,2. 
fi(x) pair Pt 5 ( sy 
Also let 


Z=X,cos0+Xsin@ and W=X)cosd—X sind. 
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19. 


20. 


21. 


22. 


23. 


Find the correlation coefficient between Z and W and show that 


where p denotes the correlation coefficient between Z and W. 

Let (X\,X2,...,X,) be an RV such that the correlation coefficient between each 
pair X;,X;,i Aj, is p. Show that —(n—1)7'<p<1. 

Let X,,X2,...,Xin4n be tid RVs with finite second moment. Let S; = pe Xj,k= 


1,2,...,m-n. Find the correlation coefficient between S,, and Sj4—Sm, where 
n>m. 
Let f be the PDF of a positive RV, and write 
A+) ity >0,y>0, 
a(x,y)=4 x+y 
0 otherwise. 


Show that g is a density function in the plane. If the mth moment of f exists for 
some positive integer m, find EX”. Compute the means and variances of X and Y 
and the correlation coefficient between X and Y in terms of moments of f. (Adapted 
from Feller [26, p. 100].) 

A die is thrown n+ 2 times. After each throw a + sign is recorded for 4, 5, or 6, and 
a — sign for 1, 2, or 3, the signs forming an ordered sequence. Each sign, except 
the first and the last, is attached to a characteristic RV that assumes the value 1 
if both the neighboring signs differ from the one between them and 0 otherwise. 
Let X),X2,...,X,, be these characteristic RVs, where X; corresponds to the (i+ 1)st 
sign (i= 1,2,...,m) in the sequence. Show that 


eye =! and va: {oe} = ne 


Let (X,Y) be jointly distributed with PDF f defined by f(x,y) = 5 inside the 
square with corners at the points (0,1), (1,0), (—1,0), (0,—1) in the G y)-plane, 
and f(x,y) = 0 otherwise. Are X,Y independent? Are they uncorrelated? 


4.6 CONDITIONAL EXPECTATION 


In Section 4.2 we defined the conditional distribution of an RV X, given Y. We showed that, 
if (X, Y) is of the discrete type, the conditional PMF of X, given Y = y;, where P{Y =y,;} > 
0, is a PMF when considered as a function of the x;’s (for fixed y;). Similarly, if (X,Y) is an 
RV of the continuous type with PDF f(x,y) and marginal densities f, and fo, respectively, 
then, at every point (x,y) at which f is continuous and at which f(y) > 0 and is continuous, 
a conditional density function of X, given Y, exists and may be defined by 


_ fy) 


fxiy(x | y) fly) ‘ 
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We also showed that fx|y(x | y), for fixed y, when considered as a function of x is a PDF 
in its own right. Therefore, we can (and do) consider the moments of this conditional 
distribution. 


Definition 1. Let X and Y be RVs defined on a probability space (0,8, P), and let h be a 
Borel-measurable function. Then the conditional expectation of h(X), given Y, written as 
E{h(X) | Y}, is an RV that takes the value E{h(X) | y}, defined by 


So h(x)P{X =x|Y=y} if (X,Y) is of the discrete 


E{h(X) |y} type and P{Y = y} >0, (1) 
Ji foe) 
l h(x)fxjy(x | y) dx if (X,Y) is of the contain- 


nous type and fo(y) > 0. 
when the RV Y assumes the value y. 


Needless to say, a similar definition may be given for the conditional expectation 
E{h(Y) | X}. 

It is immediate that E{h(X) | Y} satisfies the usual properties of an expectation provided 
we remember that E{h(X) | Y} is not a constant but an RV. The following results are easy 
to prove. We assume existence of indicated expectations. 


E{c|Y}=c, for any constant c (2) 
E {[aigi(X) +.a2go(X)] | ¥} = ai {g1(X) | Y} + aE {go(X) | Y}, (3) 


for any Borel functions g1, g2. 


P(X >0)=1=> E{xX|Y}>0 (4) 
PI, >X) = 1 = EX, | ¥} > E(x, | ¥}. (5) 


The statements in (3), (4), and (5) should be understood to hold with probability 1. 
E{X|Y}=E(X), E{Y|X}=E(Y) 6) 


for independent RVs X and Y. 
If #(X, Y) is a function of X and Y, then 


E{9(X,Y) | y} = E{O(X,y) | y} (7) 
E{ W(X) (X,Y) | X} = P(X)E{ G(X, Y) | X} (8) 


for any Borel functions w and @. 

Again (8) should be understood as holding with probability 1. Relation (7) is useful as 
a computational device. See Example 3 below. 

The moments of a conditional distribution are defined in the usual manner. Thus, for 
r > 0, E{X’ | Y} defines the rth moment of the conditional distribution. We can define the 
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central moments of the conditional distribution and, in particular, the variance. There is 
no difficulty in generalizing these concepts for n-dimensional distributions when n > 2. 
We leave the reader to furnish the details. 


Example 1. An urn contains three red and two green balls. A random sample of two balls 
is drawn (a) with replacement and (b) without replacement. Let X = 0 if the first ball drawn 
is green, = | if the first ball drawn is red, and let Y = 0 if the second ball drawn is green, 
= 1 if the second ball drawn is red. 

The joint PMF of (X, Y) is given in the following tables: 


(a) With Replacement (b) Without Replacement 
X XxX 
y 0 1 y 0 1 
4 6 | 2 2 6 | 2 
0 3 | 5 0 3% 20 | 5 
1 & 9 | 3 1 & 6 | 3 
25 «425 | 5 20 20 | 5 
2 3 2 3 
a 5 65 i 


The conditional PMFs and the conditional expectations are as follows: 


2 = 2 y=0 
(a) P{X=x|O}=43? 7 P{Y=y|0}=< 2’ ; 
5? — Fy ce) yl, 
2 2, y=1 
pax=sity={) ae PY =| I}= i i. 
5? x= 1, 5) y ’ 

3 = 3 = 
E{x|y}=43 27° eden aan 
3 _ 3 : 
5° y=1, 5) =1; 
1 x=0, 1 =0, 

(b+) P{X=x|0}=4 4 77” P{y=y|o}=42 ? 
4? x=1, 4) y 1, 
I x=0, 5, y=0, 
P{X=x|Ip=4 7 P{Y=y|lp=4 7 

2? x 1, De y 1, 

3 _ 3 = 
sab dee aa ayes 

1 1 

oe yy. 1 2 i Ie 


Example 2. For the RV (X,Y) considered in Examples 4.2.5 and 4.2.7 


11-x? | 1l+x 


1 
E(Y |x} =f wine |)ay= 57 = > 0<x<l 


and 


iy. 
E(X|y}= f afgr(ely)de=2, O<y<1. 
0 
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Also, 
2 1 y 
BU? |y}= [era =2, O0<y<l 
0 y 3 


and 


var{X | y} = E{X? | y} — [E{X | y}?? 


D432 2 
y y. J 
~ a 1. 
a 4 ge 
Theorem 1. Let Eh(X) exist. Then, 
Eh({X) = E{E{h(X) | Y}}, (9) 


Proof. Let (X,Y) be of the discrete type. Then, 
E{E{h(X) | Y}}= 0 {Somer =s |e =} P{Y=y} 


=. {Sonera =ny=} 
y x 

= SCA P{X =x,Y=y} 

spt. 
The proof in the continuous case is similar. 

Theorem | is quite useful in computation of EA(X) in many applications. 
Example 3. Let X and Y be independent continuous type RVs with respective PDF f and 
g, and DF’s F and G. Then P{X < Y} is of interest in many statistical applications. In view 
of Theorem | 
P(X <Y) =Eltycyy =E{E {Ivey |¥}}, 


where J, is the indicator function of event A. Now 


EtIxcyy|¥ =y} =E {Ipxcy ly} 
= E (Ix<y) = F(y) 


and it follows that 


PIX <¥}= {FO} =f PO)e()ay 
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If, in particular, X é Y, then 
pix<y}=f Foyo)ay=5. 

More generally, 

PIX-YV Sz} =E{E {Ix-v<zy |Y}} = EF +2)} 

= je F(y+z)gQy)dy 
gives the DF of Z = X — Y as computed in corollary to Theorem 4.4.3. 
Example 4. Consider the joint PDF 
f(x,y) = xe *"+¥) ¥>0, y > 0, and 0 otherwise 

of (X,Y). Then 


fx(x) =e“, x >0, and 0 otherwise 


fry) = 


1 
dane y>0, and 0 otherwise. 
a4 


Clearly, EY does not exist but 


E{Y |x}= [oxerw = J 
0 x 
Theorem 2. If EX* < oo, then 
var(X) = var(E{X | Y}) + E(war{X | Y}). (10) 
Proof. The right-hand side of (10) equals, by definition, 


{E(E{X | Y})° — [E(E{X | ¥})?} + E(E{X? | ¥} — (E{X | Y})’) 
= {E(E{X | Y})? — (EX)?} + EX? — E(E{X | Y})? 
= var(X). 
Corollary. If EX? < oo, then 
var(X) > var(E{X | Y}) (11) 


with equality if and only if X is a function of Y. 
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Equation (11) follows immediately from (10). The equality in (11) holds if and only if 
E(var{X | ¥}) = E(X — E{X | Y})? =0, 
which holds if and only if with probability 1 
X = E{X | Y}. (12) 


Example 5. Let X,,X2,... be iid RVs and let N be a positive integer-valued RV. Let 
Sv = so X, and suppose that the X’s and N are independent. Then, 


E(Sy) = E{E{Sy | N}}. 
Now, 
E{Sy |N =n} = E{S, |N =n} =nEX 
so that 
E(Sy) = E{NEX,} = (EN)(EX)). 


Again, we have assumed above and below that all indicated expectations exist. Also, 


var (Sy) = var(E{Sy | N}) + E(var{Sy | N}). 


First, 
var(E{Sy | N}) = var(NEX,) = (EX)? var(N). 
Second, 
var{Sy | N =n} =nvar(X,) 
so 


E(var{Sy | N}) = (EN) var(X,). 
It follows that 


Var(Sy) = (EX,)? var(N) + (EN) var(X1). 


PROBLEMS 4.6 


1. Let X be an RV with PDF given by 


1 = 2 
f(x) = exp ( “\. 00 <x <00,-00 <p<oo,o>0. 


Find E{X | a < X < b}, where a and b are constants. 
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2. 


nn 


(a) Let (X,Y) be jointly distributed with density 


1 —4,-y(1+x)7! >0 
faa ee eae 
0, otherwise. 


Find E{Y | X}. 
(b) Do the same for the joint density 


|-& 


(x+3y)e*-”, x,y > 0, 


f(x,y) = 


otherwise. 


On 


. Let (X,Y) be jointly distributed with bivariate normal density 


1 
f(x,y) = ——— = 
(y) 270 102\/1— p? 


oof ats (54) (52) (G2)+(2)]} 


Find E{X | y} and E{Y | x}. (Here, pu1, v2 € R,o1, 02 > 0, and |p| < 1.) 


. Find E{Y — E{Y | X}}’. 
. Show that E(Y — $(X))* is minimized by choosing ¢(X) = E{Y | X}. 
. Let X have PMF 


Py(X =x) =Me*/x!, x =0,1,2,... 
and suppose that is a realization of a RV A with PDF 
fA) =e, A>0. 


Find E{e~“ | X = 1}. 


. Find E(XY) by conditioning on X or Y for the following cases: 


(a) f(x,y) = xe*(+9), x > 0, y > 0, and 0 otherwise. 
(b) f(x,y) =2,0<y<x< I, and zero otherwise. 


. Suppose X has uniform PDF f(x) = 1,0 <x < 1, and 0 otherwise. Let Y be chosen 


from interval (0, X] according to PDF 
1 ‘ 
g(y|x)=-, O<y<x, and 0 otherwise 
Xx 


Find E{Y* | X} and EY* for any fixed constant k > 0. 
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4.7 ORDER STATISTICS AND THEIR DISTRIBUTIONS 


Let (X),X2,...,X,) be an n-dimensional random variable and (x1,x2,...,%,) be an n-tuple 
assumed by (X),X2,...,X,). Arrange (x),%2,...,X,) in increasing order of magnitude so 
that 


Xa) SX) Ss S Xn), 


where x1) = min(x),x2,...,Xn), X(2) is the second smallest value in x1,x2,...,X,, and so 
On, X(n) = max(x1,X2,---,Xn). If any two X;,x; are equal, their order does not matter. 


Definition 1. The function Xx) of (X),X2,...,X;,) that takes on the value x(,) in each 
possible sequence (x;,x2,...,%,) of values assumed by (X,,X2,...,X,) is known as the 
kth order statistic or statistic of order k. {X(),X(2),-+-;X(n)} is called the set of order 
statistics for (X),X2,...,Xn). 


Example 1. Let X,,X2,X3 be three RVs of the discrete type. Also, let X;, X3 take on values 
0, 1, and X> take on values 1, 2,3. Then the RV (X;, X2,X3) assumes these triplets of values: 
(0, 1,0), (0,2,0), (0,3,0), (0,1,1), (0,2,1), (0,3,1), (1,1,0), (1,2,0), (1,3,0), (1,1,1), 
(1,2,1), (1,3, 1); Xi) takes on values 0, 1; X(z) takes on values 0, 1; and X(3) takes on 
values 1, 2, 3. 


Theorem 1. Let (X,,X2,...,X,) be an n-dimensional RV. Let X(q), 1 <k <n, be the order 
statistic of order k. Then X(;) is also an RV. 


Statistical considerations such as sufficiency, completeness, invariance, and ancillarity 
(Chapter 8) lead to the consideration of order statistics in problems of statistical inference. 
Order statistics are particularly useful in nonparametric statistics (Chapter 13) where, for 
example, many test procedures are based on ranks of observations. Many of these methods 
require the distribution of the ordered observations which we now study. 

In the following we assume that X;,X2,...,X, are iid RVs. In the discrete case there is 
no magic formula to compute the distribution of any X(j) or any of the joint distributions. 
A direct computation is the best course of action. 


Example 2. Suppose X,,’s are iid with geometric PMF 
Pe = P(X =k) = pq", k= 1,2,...,0<p<1, q=1-p. 
Then for any integers x > 1 andr> 1 
P{X(y) =x} = P{XGy) Sx} —P{X iy) <x— 1}. 
Now 


P{X(r) <x} = P{At least r of X’s are < x} 
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and 


It follows that 
. n x—1)(n—-i n—-i xyi —lyi 
PX =a} => ("Ja fo a1, 


x =1,2,.... In particular, let n = r = 2. Then, 


? 


P{X() =x} = pq {pq '+2-2g"}, x21. 


Also for integers x,y > | we have 


P{Xq) =%,X) —Xay =y} = P{Xa) =x,Xa) =x+-y} 
= P{X, =x,X) =x+y}+P{X) =xt+y,X2 =x} 
= 2pq’' pg! 


y 


= 2pq™* - pq 


and 


P{X ay = 1, Xa) —Xay = 0} = P(X = Xe) = 1} =P’. 
It follows that X(1) and X(2) — X(1) are independent RVs. 


In the following we assume that X1,X2,...,X;, are iid RVs of the continuous type with 
PDF f. Let {X(1),X(2),---,Xn) } be the set of order statistics for X,,X2,...,Xn. Since the 
X; are all continuous type RVs, it follows with probability 1 that 


Xa) < X2y <1 < Xu. 


Theorem 2. The joint PDF of (X(1),X,2),---,X(ny) is given by 


ml TTiifO@), Xa) <*Q) <0 < xq; 
0, otherwise. 


8(X(1),X(2)1-+ +: X(n)) = (1) 


Proof. The transformation from (X1,X2,.-.,Xn) to (X(1),X(2),-++;X(n)) iS not one-to-one. 
In fact, there are n! possible arrangements of x; ,x2,...,X, in increasing order of magnitude. 
Thus there are n! inverses to the transformation. For example, one of the n! permutations 
might be 


X4< KX] << Xp—~1 << X3 <0 SXy << Xd, 
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then the corresponding inverse is 


x4 =X(1)s xX} =X(2); Xn-1 = X(3)5 X3 =X(4)5- ++)Xn =X(n-1)5 


X2 = X(n)- 


The Jacobian of this transformation is the determinant of an n x n identity matrix with rows 
rearranged, since each x,;) equals one and only one of x),x2,.-.,X. Therefore J = +1 and 


8(%(2) 40m) *4) XO) XQ) = WIT [F@w): 
i=1 
X11) <X2) SS X(n)- 


The same expression holds for each of the n! arrangements. 
It follows (see Remark 2) that 


8(X(1)X(2)9+ ++ X(n)) = > [[/eo) 


walla! i=1 
inverses 


_ Jaf Ge) fm) ifxa) <x@) ++ <xq). 
0 otherwise. 


Example 3. Let X),X2,X3,X4 be iid RVs with PDF f. The joint PDF of X(1),X(2), 
X(a) Xa) is 


Mnf O2)FO3)f 4), yi <y2<y3<ya, 


’ 7 oy. —_ i 
8(V1,2,¥3,¥4) {s otherwise. 


Let us compute the marginal PDF of X(2). We have 
ea(va) =a! [ff pour oa¥osos) ay aysdya 
=4! LE fee dy3d 
roof if (v4) vs] foaF0n) y3 dy} 
=atpon) fo {/ {1 Fonlfos)aa Lronay 


“ _ 2 
= argon) [A ronan 
= ary tO ry), rer 


The procedure for computing the marginal PDF of X;,), the rth-order statistic of 
X1,X2,...,X, is similar. The following theorem summarizes the result. 
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Theorem 3. The marginal PDF of X(,) is given by 


n! 


&r(Yr) = Paonia fool” [1— FOr)!" FOr), 


where F is the common DF of X,,X2,...,Xn. 


Proof. 


exon) =mipon) f of [[- of Tso die denna 


Yn) jxp 


=m ea [ - ia Tlironand 
D 


[1- FO, 
(n—r 


=n\f(yr) 


as asserted. 


We now compute the joint PDF of X(j) and Xq),1<j<k<n. 


Theorem 4. The joint PDF of X(;) and X(;) is given by 


n! 


G—-Di(k-j—-D!(n BP ODIO) 
SKI) = 9 Oy) — FOF OF OW) ify) < Ye, 
0 otherwise. 


Proof. 


Sik (Yi Yk) fn us [- os ‘s =p MPO) Fn) 


yg 1 Agi dyjpi dy cae 


af i / Se DFO 5 y4lo0)---F 01) 


os . i 


Lape pmcsge 


Pa f(y ayi ++ dyj-1 


167 


(2) 


-dyy—1 


(3) 
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Gabe Topi Fowl Fo) — Fon 
onto) Sy <m, 


as asserted. 


In a similar manner we can show that the joint PDF of Xj,),.--,X(j,),1 Sj <2 
<< je <n, l <k <n, is given by 


n! 
Shr aiayenesie V1 92+ : Yk) = (i - 1)! (j2 a7 = 1)! _ (n — jx)! | 
Fy )F01)[FO2) -FO Pa 
Ff (v2) +++ [1 - F(x)" Fv) 


for y) < yz <-++ < yz, and = 0 otherwise. 


Example 4. Let X,,X2,...,X;, be iid RVs with common PDF 


r= ft if0<x<1, 


QO otherwise. 


Then, 


n! os 7 
S=Dinemr ee ee 
8r(Yr) = ee 


0, otherwise. 
The joint distribution of X(j and X(,) is given by 


n! ; 
J=3 \k-j-1(7 n—k 
Gj—1)!(k-j—D!(n kPi (Me — yy) Fe), 
ik (ViYk) = eet 


0, otherwise, 


where 1 <j<k<n. 
The joint PDF of X(;) and X,,,) is given by 


: 


SinQVi,In) =N(n—-VLOn-y1 O0<y1<yn <1 


and that of the range Ry, = X(,) —X(1) by 


n(n—1)w"7(1-w), O<w<1, 
8r,(w) = 
0, otherwise. 
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Example 5. Let X(1),X(2),X(3) be the order statistics of iid RVs X),X2,X3 with common 
PDF 


i 
r= {5 ae (8 >0). 


0, otherwise, 


Let Y; = X(3) — X(2) and Y> = X (2). We show that Y; and Y2 are independent. The joint 
PDF of X(2) and Xa) is given by 


3! 
—~"_ (| — eA —Bx 2,—By 
823(%,y) = Trot eRe Be, x<y, 


0, otherwise. 
The PDF of (Y,, Y2) is 


F(yi,y2) = 3! B2(1 —e 8 )e Pre Ort) 
= _ _ e~ P92) 1 f Be}, 0<y1 <00,0<y2 <0, 


0, otherwise. 


It follows that Y; and Y> are independent. 


Finally, we consider the moments, namely, the means, variances, and covariances of 
order statistics. Suppose X1,X2,...,X;, are iid RVs with common DF F. Let g be a Borel 
function on ® such that E|g(X)| < oo, where X has DF F. Then for 1 <r<n 


n| r-lt, _ x)" F(x) dx 
[es Car ce a 


and we write 
Eg(X()) = / g(y)gr(y)ay, 


for r = 1,2,...,n. The converse also holds. Suppose E|g(X(,))| < co for r= 1,2,...,n 
Then, 


n(" a :) J leCole elt Fea) < 0 


r—1] Joo 


170 MULTIPLE RANDOM VARIABLES 


nf a al ~~ Istolr eax 


= nf |g (x) |f(x)dx < 00. 


for r= 1,2,...,n and hence 


Moreover, it also follows that 
So Ee(X) = nBe(X). 
r=1 


As a consequence of the above remarks we note that if E]g(X;,))| = oo for some r, 1 < 
r <n, then E|g(X)| = oo and conversely, if E|g(X)| = oo then E|g|X(,))| = oo for some r, 
l<r<n. 


Example 6. Let X,,X2,...,X, be iid with Pareto PDF f(x) = 1/x*, if x > 1, and = 0 
otherwise. Then EX = co. Now forl<r<n 


n= ¢= 1\"' 1 d& 
EX(y = xl is = 
(r) n("— 1) f a ~) xr x2 
il . 
=n(" )/ ey ay, 
r—1 0 


Since the integral on the right side converges for 1 <r <n—1 and diverges for r > n—k, 
we see that EX(,) = 00 forr=n—k-+1,...,n 


PROBLEMS 4.7 


1. Let X(1),X(a),---X (ny be the set of order statistics of independent RVs X1,X2,...,Xn 
with common PDF 


) otherwise. 


fi) = Be ifx>0, 


(a) Show that X(,) and X(,) —X(,) are independent for any s > r. 
(b) Find the PDF of X(,41) — X(,). 


(c) Let Z) = nX(1),Z2 = (n— 1) (XQ) —Xy), Zs = (n— 2) (Xa) — Xa a Ln = 
(X(n) — X(n—1). Show that (Z,,Z,...,Zn) ca (X1,Xo,...,X,) ar S identically 
distributed. 


2. Let X),X2,...,X, be iid from PMF 
Pr=1/N, k=1,2,...,N. 
Find the marginal distributions of X 1) - Xn)» and their joint PMF. 
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3. Let X,,Xo,...,X, be iid with a DF 


ie y* if0<y<l, 
ae 0 otherwise, a>0. 


Show that X(;) /X(n) ,i=1,2,...,n—1, and X(,) are independent. 
4, Let X1,X2,...,X, be iid RVs with common Pareto DF f(x) = ao®%/x°t!, x >a 
where a > 0, a > 0. Show that 
(a) X(1) and (X(2)/X(1),--.,Xn) /X(1)) are independent, 
(b) X(1) has Pareto (o,nq) distribution, and 
(c) y=! én(X i) /X 1) has PDF 


x2 e- ax 


f(x) = tor x>0. 


5. Let X1,X2,...,X;, be iid nonnegative RVs of the continuous type. If E|X| < 00, show 
that E|X(,)| < oo. Write M, = Xn) = max(X),X2,...,Xn). Show that 


EM,, = EM,_ +f F'G\1-—FG@)|de, 9m = 2,3)... 
0 

Find EM,, in each of the following cases: 

(a) X; have the common DF 


F(x) =1-e""*, x>0. 
(b) X; have the common DF 


F(x) =x,0<x< 1. 


6. Let X(1),X(2),---,X(n) be the order statistics of n independent RVs X,,X2,.-.,Xn with 
common PDF f(x) = 1 if 0 < x < 1, and = 0 otherwise. Show that Y; = X(1)/X,2), 
Yo = X(2)/X(3),-++5 Yin—1) = X(n—1)/X(n), and Y,, = Xn) are independent. Find the 
PDFs of Y,, Y2,..-,Yn- 

7. For the PDF in Problem 4 find EX,). 


8. An urn contains N identical marbles numbered | through N. From the urn n mar- 
bles are drawn and let X(,,) be the largest number drawn. Show that P(X (n) = k)= 
(71)/(), k=nnt,...,N, and EX() =n(N+1)/(n+ 1). 


n—1 n 


5 
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5.1 INTRODUCTION 


In preceding chapters we studied probability distributions in general. In this chapter we 
will study some commonly occurring probability distributions and investigate their basic 
properties. The results of this chapter will be of considerable use in theoretical as well 
as practical applications. We begin with some discrete distributions in Section 5.2 and 
follow with some continuous models in Section 5.3. Section 5.4 deals with bivariate and 
multivariate normal distributions and in Section 5.5 we discuss the exponential family of 
distributions. 


5.2. SOME DISCRETE DISTRIBUTIONS 


In this section we study some well-known univariate and multivariate discrete distributions 
and describe their important properties. 


5.2.1 Degenerate Distribution 


The simplest distribution is that of an RV X degenerate at point k, that is, P{X =k} = 1 
and = 0 elsewhere. If we define 


0 ifx<0 
= : 1 
el) : ifx>0, (1) 
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the DF of the RV X is e(x—k). Clearly, EX! =k',1=1,2,..., and M(t) = e“. In particular, 
var(X) = 0. This property characterizes a degenerate RV. As we shall see, the degenerate 
RV plays an important role in the study of limit theorems. 


5.2.2. Two-Point Distribution 


We say that an RV X has a two-point distribution if it takes two values, x; and x2, with 
probabilities 


P{X =x, }=p, P{X =x}=1-p, O<p<il. 
We may write 
X = xX [xen] +X2l[x=n]; (2) 


where J, is the indicator function of A. The DF of X is given by 


F(x) = pe(x—x1) + (1—p)e(x—»). (3) 
Also 
EX* =pxt+(1—p)xt, k&=1,2,..., (4) 
M(t) =pe™ + (1—p)e” for all t. (5) 
In particular, 
EX = px, + (1—p)x2 (6) 
and 
var(X) = p(1 —p)(%1 — 22)’. (7) 


If x; = 1, x2 = 0, we get the important Bernoulli RV: 
P{X =1}=p, P{X =0}=1-p, O<p<l. (8) 
For a Bernoulli RV X with parameter p, we write X ~ b(1,p) and have 
EX =p, var(X) = p(1—p), M(t)=1+p(e'—1), allt. (9) 


Bernoulli RVs occur in practice, for example, in coin-tossing experiments. Suppose that 
P{H} =p, 0 <p <1, and P{T} = 1 —p. Define RV X so that X(H) = | and X(T) =0. 
Then P{X = 1} = p and P{X = 0} = 1 —p. Each repetition of the experiment will be 
called a trial. More generally, any nontrivial experiment can be dichotomized to yield a 
Bernoulli model. Let (2.,8,P) be the sample space of an experiment, and let A € S with 
P(A) =p > 0. Then P(A‘) = 1 —p. Each performance of the experiment is a Bernoulli 
trial. It will be convenient to call the occurrence of event A a success and the occurrence 
of A‘, a failure. 
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Example 1 (Sabharwal [97]). In a sequence of n Bernoulli trials with constant probability 
p of success (S), and | —p of failure (F), let Y,, denote the number of times the combination 
SF occurs. To find EY,, and var(Y,,), let X; represent the event that occurs on the ith trial, 
and define RVs 


1 ifX;=S, X41 =F, 
FX) =| : : : an = 152,35 wn—1) 
0 otherwise. 
Then 
n—1 
Yn => f(%, X41) 
i=1 
and 
EY, = (n—1)p(1—p). 
Also, 
n—1 
ene} S Pekan) +8 SOS SAK Xi KX) 
i=1 ij 
= (n—1)p(1—p) + (n—2)(n—3)p*(1—p)’, 
so that 


If p = 1/2, then 


5.2.3 Uniform Distribution on n Points 


X is said to have a uniform distribution on n points {x1,x2,...,Xn} if its PMF is of the form 
1 

Pee. ign (10) 
n 


Thus we may write 


n 1 
X= So xilpeax] and F(x) = = é(x—4xi), 
i=1 i=1 
1 n 
EX=-S°x;, (11) 
n 
i=1 
1 n 
Ms oN x, 1a T2jon (12) 
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and 


n 2 
var(X) = , = (<>) = ~ > Gi=3) (13) 
i=1 i= 


M(t) = -\ oe for all r. (14) 
n 
i=1 
If, in particular, x; =i,i=1,2,...,n, 
1 1)(2n+1 
x=" ; ex? = @ENEn +N) (15) 
2 
-1 
var(X) =" _ (16) 


Example 2. A box contains tickets numbered 1 to N. Let X be the largest number drawn 
in n random drawings with replacement. 
Then P{X < k} = (k/N)", so that 


P(X =k} = PIX SK} PUK Sk-1} 
=(x) -Cee) 


N 
EX=N-"Y [et! —(k—1)"*! — (k-1)"] 
1 


=—N~" [pe -So(k- 0 ; 


Also, 


5.2.4 Binomial Distribution 


We say that X has a binomial distribution with parameter p if its PMF is given by 


n 


p= PX=K} = (7) Ay k=0,1,2,....n; O<p<l. (17) 


Since )>¢_» Pe = [p+ (1 —p)}" = 1, the p,’s indeed define a PMF. If X has PMF (17), 
we will write X ~ b(n,p). This is consistent with the notation for a Bernoulli RV. We have 


oeD ({,) PM —pyels—4) 


k=0 


SOME DISCRETE DISTRIBUTIONS 177 


In Example 3.2.5 we showed that 


EX = np, (18) 
EX? = n(n—1)p” +np, (19) 

and 
var(X) = np(1—p) =npq, (20) 


where g = | — p. Also, 


= (q4+pe')” for all t. (21) 


The PGF of X ~ b(n, p) is given by P(s) = {1—p(1—s)}", |s| <1. 

Binomial distribution can also be considered as the distribution of the sum of n inde- 
pendent, identically distributed b(1,p) random variables. If we toss a coin, with constant 
probability p of heads and | — p of tails, n times, the distribution of the number of heads 
is given by (17). Alternatively, if we write 


\ if kth toss results in a head, 
eS 


0 otherwise, 
the number of heads in n trials is the sum S, = X; +X>+---+X,,. Also 


P{X, = 1} =p, P{X, =0} =1-p, KSA 2 ya tngils 


Thus 
ES, = )_ EX; =np, 
1 
var(S;) = S"var(X;) =np(1—p), 
1 
and 
M(t) = | | Be™ 
i=l 
=(q-+pe')" 


Theorem 1. Let X;(i = 1,2,...,k) be independent RVs with X; ~ b(n;,p). Then S, = 
See has a b(n, +2 +--+-+n,, p) distribution. 


Corollary. If X;(i = 1,2,...,k) are iid RVs with common PMF b(n,p), then S; has a 
b(nk, p) distribution. 
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Actually, the additive property described in Theorem | characterizes the binomial dis- 
tribution in the following sense. Let X and Y be two independent, nonnegative, finite 
integer-valued RVs and let Z = X + Y. Then Z is a binomial RV with parameter p if and 
only if X and Y are binomial RVs with the same parameter p. The “only if” part is due to 
Shanbhag and Basawa [103] and will not be proved here. 


Example 3. A fair die is rolled n times. The probability of obtaining exactly one 6 is 
n(z)(2)"~|, the probability of obtaining no 6 is (2)", and the probability of obtaining at 
least one 6 is 1 — (2)". 

The number of trials needed for the probability of at least one 6 to be > 1/2 is given 
by the smallest integer 1 such that 


so that 


Example 4. Here r balls are distributed in n cells so that each of n’ possible arrangements 
has probability n~". We are interested in the probability p,; that a specified cell has exactly 
k balls (k = 0,1,2,...,r). Then the distribution of each ball may be considered as a trial. 
A success results if the ball goes to the specified cell (with probability 1/n); otherwise the 
trial results in a failure (with probability 1 — 1/n). Let X denote the number of successes 
in r trials. Then 


; 1\* 1\"* 
=P{X=k}= - 1-- k=0,1,2,...,n. 
m=Px=K=(1)(2) A-2) k= 0,12..0 


5.2.5 Negative Binomial Distribution (Pascal or Waiting Time Distribution) 


Let (Q,8,P) be a probability space of a given statistical experiment, and let A € S with 
P(A) =p. On any performance of the experiment, if A happens we call it a success, oth- 
erwise a failure. Consider a succession of trials of this experiment, and let us compute the 
probability of observing exactly r successes, where r > | is a fixed integer. If X denotes 
the number of failures that precede the rth success, X +r is the total number of replica- 
tions needed to produce r successes. This will happen if and only if the last trial results in 
a success and among the previous (r+ X — 1) trials there are exactly X failures. It follows 
by independence that 


P{X=x}= (“rr poy. x=0,1,2,.... (22) 


Rewriting (22) in the form 


—r\ . 
Px=ap= (“)p'Ca" x=0,1,2,...5 g=1=p, (23) 
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we see that 


y (~’) (—q)'=(1-q) "=p. (24) 


It follows that 


Pix =a} =. 


x=0 


Definition 1. For a fixed positive integer r > 1 and 0 < p < 1, an RV with PMF given by 
(22) is said to have a negative binomial distribution. We will use the notation X ~ NB(r;p) 
to denote that X has a negative binomial distribution. 


We may write 


X=Soxyoy and FQ) = wa ‘)p'U —p)te(x—¥). 
x=0 


k=0 
For the MGF of X we have 
= (8471 hop 
M(t) = rd Ix 
= LT rune 
P = y(x+tr—1 
= er) @=t-9 
x=0 
= p’(1—qe')" for ge’ <1. (25) 


The PGF is given by P(s) = p’(1—sq)~, |s| < 1. Also, 


= (tera ge 
Ex=)x( x ra 


x=0 
= wy (“* ‘) gq 
x=0 
= iggy! ==. (26) 
Similarly, we can show that 
var(X) = m (27) 


If, however, we are interested in the distribution of the number of trials required to get 
r successes, we have, writing Y= X +7, 


P{Y=y}= ("= )eu-er~ Yar rtd (28) 
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RSE ipe". 
2 (29) 
var(Y) = var(X) = 


and 
My(t) = (pe’)"(1—qe')"" for ge’ <1. (30) 
Let X beab(n,p) RV, and let Y be the RV defined in (28). If there are r or more successes 
in the first 7 trials, at most n trials were required to obtain the first r of these successes. 
We have 
P{X >r}=P{Y <n} (31) 
and also 
P{X <r} =P{Y >n}. (32) 
In the special case when r = 1, the distribution of X is given by 
P{X =x}=pq', K=O; 12008. (33) 


An RV X with PMF (33) is said to have a geometric distribution. Clearly, for the geometric 
distribution, we have 


ae p(l—ge')", 
q 
sen a 


Example 5 (Banach’s matchbox problem). A mathematician carries one matchbox each 
in his right and left pockets. When he wants a match, he selects the left pocket with proba- 
bility p and the right pocket with probability | — p. Suppose that initially each box contains 
N matches. Consider the moment when the mathematician discovers that a box is empty. 
At that time the other box may contain 0,1,2...,N matches. Let us identify success with 
the choice of the left pocket. The left-pocket box will be empty at the moment when the 
right-pocket box contains exactly r matches if and only if exactly N — r failures precede 
the (N + 1)st success. A similar argument applies to the right pocket, and we have 


Pr = probability that the mathematician discovers a box empty while 


the other contains r matches 


2N—Pr\ wat N+1_.N— 
=(4- \p g'+( jae 


Example 6. A fair die is rolled repeatedly. Let us compute the probability of event A that a 
2 will show up before a 5. Let A; be the event that a 2 shows up on the jth trial (j = 1,2,...) 
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for the first time, and a 5 does not show up on the previous j — | trials. Then PA = yl PAj, 
where PA; = 7(4)—!. It follows that 


no -E()()-4 


Similarly the probability that a 2 will show up before a 5 or a 6 is 1/3, and so on. 


Theorem 2. Let X),X2,...,X; be independent NB(r;;p) RV’s, i= 1,2,...,k, respectively. 
Then S; = es X; is distributed as NB(r; + ro +---+743p). 


Corollary. If X,,X2,...,X;, are iid geometric RVs, then S; is an NB(k;p) RV. 


Theorem 3. Let X and Y be independent RVs with PMFs NB(r1;p) and NB(r2;p), 
respectively. Then the conditional PMF of X, given X + Y = t, is expressed by 


Catz 


ies a 7) 


P{X=3|K+Y=t}= 


If, in particular, r} = r2 = 1, the conditional distribution is uniform on f+ 1 points. 


Proof. By Theorem 2, X + Y is an NB(r; +1r2;p) RV. Thus 
P{X =x, Y=t—x} 
P{X+Y=?} 
pam ( _ py Creep? (1 —p)* 
Peren” pate (1 — p)' 


ee 


Ce — ') ? 


P{X =x|X+Y=t}= 


PSO 2 ives 
If 7; = 72 = 1, that is, if X and Y are independent geometric RVs, then 


1 
PIX=xX+V=t= 5, x= 01,206 1= 0,12... (35) 


Theorem 4 (Chatterji [13]). Let X and Y be iid RVs, and let 
P{X =k} =p, >0, k= 10; TD aise: 


If 


1 
PAX=UX+V=th=P[X=t-X+¥=th=—_, t>0, (36) 


then X and Y are geometric RVs. 
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Proof. We have 


PrPo 1 
PYX=tX+Y=t}= = 37 
{ |X + } So (37) 


and 


_ iy. Peipi _ 1 
PSS a (38) 


It follows that 


Pri 
Pt-1 Po 


and by iteration p; = (pi /po)‘po. Since )> 9 p: = 1, we must have (p;/po) < 1. Moreover, 


1=po 


ee 
1—(pi/po)’ 


so that (p; /po) = 1 — po, and the proof is complete. 


Theorem 5. If X has a geometric distribution, then, for any two nonnegative integers m 
and n, 


P{X >m-+n|X >m} = P{X >n}. (39) 
Proof. The proof is left as an exercise. 


Remark I, Theorem 5 says that the geometric distribution has no memory, that is, the 
information of no successes in m trials is forgotten in subsequent calculations. 


The converse of Theorem 5 is also true. 
Theorem 6. Let X be a nonnegative integer-valued RV satisfying 
P{X >m+1|X >m}=P{xX > 1} 
for any nonnegative integer m. Then X must have a geometric distribution. 
Proof. Let the PMF of X be written as 
P{X =k} = py, K=0, 152) 0.6: 


Then 


P{X>n}=S oy 


k=n 
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and 

co 

Pix > m} = S Pk =dm, Say, 
m+1 
PiX 1 
P{X >m+1|X >m} = {X > m+} = Gm-+1 
P{X > m} dn 
Thus 
Am+1 = 4m 40, 


where go = P{X > 0} =p; +p2 +-:: = 1—po. It follows that g, = (1 — po)**!, and hence 
Pk = Qk-1 — 4k = (1 —po)*po, as asserted. 


Theorem 7. Let X;,X2,...,X,, be independent geometric RVs with parameters p;,)2,-..-,DPn> 
respectively. Then X(;) = min(X,X2,...,X,) is also a geometric RV with parameter 


n 


p= 1-[fa — Pi). 


i=1 
Proof. The proof is left as an exercise. 


Corollary. Tid RVs X;,X2,...,X, are NB(1;p) if and only if X/,) is a geometric RV with 
parameter | — (1—p)". 


Proof. The necessity follows from Theorem 7. For the sufficiency part of the proof let 
P{Xq) <k} =1—P{Xq) > k} =1—-(1 py". 
But 
P{X(1y <k} = 1—P{X > k, Xo > k,...,Xn > k} 
=1-[1- FQ)", 
where F is the common DF of X,,X2,...,X,. It follows that 
[1 — F(k)] = (1p), 


so that P{X, > k} = (1—p)**!, which completes the proof. 


5.2.6 Hypergeometric Distribution 


A box contains NV marbles. Of these, M are drawn at random, marked, and returned to the 
box. The contents of the box are then thoroughly mixed. Next, n marbles are drawn at 
random from the box, and the marked marbles are counted. If X denotes the number of 


marked marbles, then 
-1 
N M\ (N-M 
n x n-Xx 
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Since x cannot exceed M or n, we must have 
x <min(M,n). (41) 
Also x > 0 and N—M >n—x, so that 


x >max(0,M+n—N). (42) 


Oat) = (F") 


for arbitrary numbers a, b and positive integer n. It follows that 


ei) 


Definition 2. An RV X with PMF given by (47) is called a hypergeometric RV. 


Note that 


It is easy to check that 


n 
EX = <M, (43) 
> M(M-1) nM 
EX’ = NW=1 n(n—1)+ W (44) 
and 
nM 
var(X) = NN —1) (N—M)(N—n). (45) 


Example 7. A \ot consisting of 50 bulbs is inspected by taking at random 10 bulbs and 
testing them. If the number of defective bulbs is at most 1, the lot is accepted; otherwise, 
it is rejected. If there are, in fact, 10 defective bulbs in the lot, the probability of accepting 


the lot is 
10\ /40 40 
1 97 , \10 
50 ' (50 
10 10 
Example 8. Suppose that an urn contains b white and c black balls, b-+c = N. A ball is 
drawn at random, and before drawing the next ball, s+ 1 balls of the same color are added 
to the urn. The procedure is repeated n times. Let X be the number of white balls drawn 


inn draws, X = 0,1,2,...,. We shall find the PMF of X. 
First note that the probability of drawing k white balls in successive draws is 


(2) (248) (BAe). (ee OT, 
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and the probability of drawing k white balls in the first k draws and then n — k black balls 
in the next n — k draws is 


2 a 


Here px also gives the probability of drawing k white and n — k black balls in any given 
order. It follows that 


n 
PUx=K}= (7) (47) 
An RV X with PMF given by (47) is said to have a Polya distribution. Let us write 
Np =), N(1—p)=c, and Na=s. 


Then with g = | — p, we have 


n\ p(p+a)---[p+(k—ljalg(q+a)---[g+(n—k—1)a] 
pix=K= (7) (ee eres 


Let us take s = —1. This means that the ball drawn at each draw is not replaced in the urn 
before drawing the next ball. In this case a = —1/N, and we have 


_ ay (8) NeW = 1)-++ [Np (= Nlele=1)-++[e— (nk 1) 
pox=a= (7) NW-1)- (N= (0) 


= GC) (48) 


which is a hypergeometric distribution. Here 


max(0,n—Nq) <k < min(n,Np). (49) 


Theorem 8. Let X and Y be independent RVs with PMFs b(m, p) and b(n, p), respectively. 
Then the conditional distribution of X, given X + Y, is hypergeometric. 


5.2.7 Negative Hypergeometric Distribution 


Consider the model of Section 5.2.6. A box contains N marbles, M of these are marked (or 
say defective) and N — M are unmarked. A sample of size n is taken and let X denote the 
number of defective marbles in the sample. If the sample is drawn without replacement 
we saw that X has a hypergeometric distribution with PMF (40). If, on the other hand, the 
sample is drawn with replacement then X ~ b(n,p) where p = M/N. 
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Let Y denote the number of draws needed to draw the rth defective marble. If the draws 
are made with replacement then Y has the negative binomial distribution given in (22) with 
p=M_/N. What if the draws are made without replacement? In that case in order that the 
kth draw (k > r) be the rth defective marble drawn, the kth draw must produce a defective 
marble, whereas the previous k — 1 draws must produce r — | defectives. It follows that 


2) wie 


cy wee 


P(Y=k)= 


fork =r,r+1,...,N. Rewriting we see that 


k- ) oo) 


P(Y=k)= ( an 


An RV Y with PMF (50) is said to have a negative hypergeometric distribution. 


It is easy to see that 


Saree _ Art INE IN+2) 
EY =r: EY(Y +1) = Wee 
and 
in. 2 


(M+ 1)?(M +2) 


Also, if r/N — 0, and k/N — 0 as N > oo, then 
koi ak M\"(, a 
r—1/\M-r M r-1 N N 
which is (22). 


5.2.8 Poisson Distribution 


(50) 


Definition 3. An RV X is said to be a Poisson RV with parameter \ > 0 if its PMF is 


given by 


eo k 


kl? 
We first check to see that (51) indeed defines a PMF. We have 


S P{X =k} =e are Not ae, 
k=0 k=0 
If X has the PMF given by (51), we will write X ~ P(A). Clearly, 


X= x KIix=14 
k=0 


P{X =k} = k=0,1,2,.... 


(51) 
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and 


The mean and the variance are given by (see Problem 3.2.9) 
EX=, EX? =4’, (52) 
and 
var(X) = X. (53) 
The MGF of X is given by (see Example 3.3.7) 
Ee™ = exp{X(e' — 1)}, (54) 
and the PGF by P(s) = e~*-), |s| <1. 


Theorem 9. Let X1,X2,...,X, be independent Poisson RVs with X, ~ P(Ax), k = 
1,2,...,n. Then S, = Xj +X2+---+X, isa P(A; +A2 +++: +An) RV. 


The converse of Theorem 9 is also true. Indeed, Raikov [84] showed that if 
X\,X2,...,X, are independent and S, = poe: ¢ has a Poisson distribution, each of the 
RVs X,,X2,...,X, has a Poisson distribution. 


Example 9. The number of female insects in a given region follows a Poisson distribution 
with mean \. The number of eggs laid by each insect is a P(t.) RV. We are interested in 
the probability distribution of the number of eggs in the region. 

Let F be the number of female insects in the given region. Then 


Let Y be the number of eggs laid by each insect. Then 


Pap a= Fira Sys) 
eM (f)’en Mf 


A: y! 
Thus 
ew SS eHP 
== y! f! 


f=0 
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The MGF of Y is given by 


=D oe a eit 


as ae exp{fp(e — 1)} 
f=0 


cis rene -D VF 
ast 
i 


=e expf{re#—)}, 


Theorem 10. Let X and Y be independent RVs with PMFs P(A,) and P(A2), respectively. 
Then the conditional distribution of X, given X + Y, is binomial. 


Proof. For nonnegative integers m and n, m <n, we have 
P{X =m, Y=n—m} 
P{X+Y =n} 


e (At /mle (Ay /(n—m)!) 
e7 (Ai +22) (Ay + A2)"/n! 


: (") amy 
~~ \m (Ay +2)” 


_ (")( M iG ‘1 - 
~ Ain Ay +A2 Ay +A2 , 


m=0,1,2,...,, 


P{X =m|X+Y=n}= 


and the proof is complete. 


Remark 2. The converse of this result is also true in the following sense. If X and Y are 
independent nonnegative integer-valued RVs such that P{X = k} > 0, P{Y =k} > 0, for 
k=0,1,2,..., and the conditional distribution of X, given X + Y, is binomial, both X and 
Y are Poisson. This result is due to Chatterji [13]. For the proof see Problem 13. 


Theorem 11. If X ~ P(A) and the conditional distribution of Y, given X = x, is b(x,p), 
then Y is a P(Ap) RV. 


Example 10. (Lamperti and Kruskal [60]). Let N be a nonnegative integer-valued RV. 
Independently of each other, N balls are placed either in urn A with probability p (0 < p< 
1) or in urn B with probability 1 — p, resulting in N4 balls in urn A and Ng = N — Ng balls 
in urn B. We will show that the RVs N4 and Ng are independent if and only if N has a 
Poisson distribution. We have 


a+b 


P{N, =a and Ng = b|IN=a+b} = ( a "an 
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where a, b, are integers > 0. Thus 


b 
P(N =a, Na =0} = (“* erin =n), q=1-p, n=a+b. 


If N has a Poisson (A) distribution, then 


(a oT b)! a,b ee? 
alb! 


_ es (e ov) 
a! b! 


so that N4 and Nz are independent. 
Conversely, if N4 and Nz are independent, then 


P{N, =a, Ng =b}= 


P{N =n}n! = f(a)g(b) 
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for some functions f and g. Clearly, f(0) 4 0, g(0) 40 because P{N4, = 0, Ng = 0} > 0. 
Thus there is a function h such that h(a+b) =f(a)g(b) for all nonnegative integers a, b. 


It follows that 


fla) =f) | 89 | 


We may write, for some a, Q2, 4, 


flaj=ae, —_g(b) = aye, 
e7r(atb) 


P{N=n}= C102 Gaby 


so that N is a Poisson RV. 


5.2.9 Multinomial Distribution 


The binomial distribution is generalized in the following natural fashion. Suppose that an 
experiment is repeated n times. Each replication of the experiment terminates in one of k 
mutually exclusive and exhaustive events Aj ,A2,...,A,z. Let p; be the probability that the 
experiment terminates in Aj, j = 1,2,...,k, and suppose that p; j = 1,2,...,k) remains 


constant for all n replications. We assume that the n replications are independent. 
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Let x, ,%2,...,%,_ 1 be nonnegative integers such that x; +x. +-+-+x,_,; <n. Then the 
probability that exactly x; trials terminate in A;, i= 1,2,...,k —1 and hence that x, = 
n— (x; +X +-++++x,—1) trials terminate in A; is clearly 

n! XE 
Xl! Pal Py" Pe 
If (X1,X2,...,X,) is arandom vector such that X; = x; means that event A; has occurred x; 
times, xj = 0,1,2,...,n, the joint PMF of (X,X2,...,X;) is given by 


P{X, =x, X2 =%,..., Xp = xe} (55) 
a ifn= i 
— xy lxo!--- Pa Py . +P _ I Ho 
0) otherwise. 


Definition 4. An RV (X1,X2,...,X,—1) with joint PMF given by 


PAR = hy Mo Hin AH (56) 
n! : 
Oey . “a 
_ xml... (n—xy —- — age 
a if xy +X. +--+ +x) <2, 
0 otherwise 


is said to have a multinomial distribution. 
For the MGF of (X1,X2,...,Xx—1) we have 


M(t, to ti 1) = FeliXithXete +h 1X1 
es 


n 
n! x: Xk 
= a efit te ttt n\p;'P> Pr 
x !xQ! ++ + x4! 


2 


XY XQ 4-06 Xk—1 =0 
Xytxg+-+-x,-1<n 
n 


= oe mee 


Hy N52 Xo =—0 
Xp AXg XR <N 


i (ee op 
— (pie" + pre” +4 Soe + pee! +px)" (57) 
for all t),f2,...,%-1 ER. 


Clearly, 
M(t),0,0,...,0) — (pie + p2+:: -+ px)” = dd —Pi +pie")", 


which is binomial. Indeed, the marginal PMF of each X;, i= 1,2,...,4 — 1, is binomial. 
Similarly, the joint MGF of X;, Xj, i,j = 1,2,...,.k --1(@ A), is 


M(0,0,...,0,1;,0,...,0,t;,0,...,0) = [pie +pjei+(1 —pi-pj)]"; 
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which is the MGF of a trinomial distribution with PMF 


n! Xp Xj_N—Xj —Xj 
ppp, aga 58 
xjbql(n—xj — x)! Pj Pk Pk ae (58) 


Sf (xi, %4) = 
Note that the RVs X,,X2,...,X;_ are dependent. 
From the MGF of (X),X2,...,X—1) or directly from the marginal PMFs we can 
compute the moments. Thus 
EX; = np; and var(X;) =np;(1—p,), jJ=1,2,...,k-1, (59) 
and forj = 1,2,...,k-—1, andi ¥j, 
cov(Xi,X;) = E{(X; — mpi) (X;— np,)} = —npip), (60) 


It follows that the correlation coefficient between X; and Xj; is given by 


1/2 
PiPj 5d pe 
ij = ; J=1,2,...,k-1 : 61 
_ [ese im GA) (61) 


Example 11. Consider the trinomial distribution with PMF 


n! . 
x y N—x—y 
1P1P2P3 ? 


a ar are 


where x, y are nonnegative integers such that x+y < n, and py, po, p3 > O with p) +po+ 
p3 = 1. The marginal PME of X is given by 


P{X=x}= (")oia-py X=0,1,2,...,2. 


It follows that 


P{Y =y|X =x} 
(n—x)! P2 p3 \"* > 
- if y=0,1,2,...,n—x, 
= { yi(n—=x ne ~) (<2) ‘ (62) 
0 otherwise, 
which is b(n — x,p2/(1 —p1)). Thus 
E{Y |x} = (n—2) =. (63) 
1-p 
Similarly, 
Pi 
E{X|y} = (n—y) (64) 


1—py 
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Finally, we note that, if X = (X,,X2,...,X;,) and Y = (Y,, Y2,..., Y,) are two indepen- 
dent multinomial RVs with common parameter (p),p2,...,px), then Z= X+Y is alsoa 
multinomial RV with probabilities (p,,p2,...,px). This follows easily if one employs the 
MGF technique, using (57). Actually this property characterizes the multinomial distri- 
bution. If X and Y are k-dimensional, nonnegative, independent random vectors, and if 
Z = X+Y isa multinomial random vector with parameter (p,,p2,...,p,), then X and Y 
also have multinomial distribution with the same parameter. This result is due to Shanbhag 
and Basawa [103] and will not be proved here. 


5.2.10 Multivariate Hypergeometric Distribution 


Consider an urn containing N items divided into k categories containing 1;,N2,..., M 
items, where i nj = N. A random sample, without replacement, of size n is taken 
from the urn. Let X; = number of items in sample of type i. Then 


k 
; N 
P{X, = %1,X2 =X2,...,Xx a agile eae (65) 
gat SF 


where x; = 0,1,..., min(n,n;) and ea X= Nn. 

We say that (X1,X2,...,X,—1) has multivariate hypergeometric distribution if its joint 
PMF is given by (65). It is clear that each X; has a marginal hypergeometric distribution. 
Moreover, the conditional distributions are also hypergeometric. Thus 


("") mean!) 
hy} \a— aay 
n=xy 


P{X; = xi|Xj =x} = 


and 


nj\ (N-nj—nj—ne 
() Cone —X¢ ) 
Ca) ? 
N—Xj—Xe 
and so on. It is therefore easy to write down the marginal and conditional means and 
variances. We leave the reader to show that 


P{X; Xi|X; xj, Xe xe} 


nj 
EX; =n 2, 
N 


n; (N-n,; N-n 
wt (52) (I) 


N— Ao 
cov(X;,X;) == =n(=) : 


and 


5.2.11 Multivariate Negative Binomial Distribution 


Consider the setup of Section 5.2.9 where each replication of an experiment terminates 
in one of k mutually exclusive and exhaustive events A;,A2,...,Ax. Let pj = P(Aj), 
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j= 1,2,...,k. Suppose the experiment is repeated until event A; is observed for the rth 
time, r > 1. Then 


P(X =X, Xp =%o,...,X~ =P) 


k-1 
Xp +xgt-s-+x-1+r—-1)! , xy 
= ited tie TD’. (66) 
(is xj!) (r—1)! j=l 


for x; = 0,1,2,...@@=1,2,...k-l, l<r<w,0<p, <1, panes < 1, and pp, = 
1— ae i Pj. 

We say that (X1,X2,...,X,-1) has a multivariate negative binomial (or negative 
multinomial) distribution if its joint PMF is given by (66). 

It is easy to see the marginal PMF of any subset of {X,,X2,...,X,_,} is negative 
multinomial. In particular, each X; has a negative binomial distribution. 

We will leave the reader to show that 


k-1 
k-1 
M(51,52)-+-y5e-1) = Be") = pk (1-S spi}, (67) 
j=l 
and 
Pip; 
cov(X;,X;) = “Pei (68) 
Px 
PROBLEMS 5.2 


1. (a) Let us write 
n\ k n—k 
b(k;n,p) = k P (1—p) ’ k=0,1,2,...,7. 


Show that, as k goes from 0 to n, b(k;n,p) first increases monotonically and then 
decreases monotonically. The greatest value is assumed when k = m, where m 
is an integer such that 


(n+1)p—l<m<(n+l)p 


except that b(m— 1;n,p) = b(m;n,p) when m = (n+ 1)p. 
(b) If k > np, then 


(kK+1)U=p) | 


P{X>kK}S< MMP) Ft Gate’ 


and if k < np, then 


(n—k+1)p 


P{X<k}< 2 rare ae 
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6. 


10. 


11. 
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. Generalize the result in Theorem 10 to n independent Poisson RVs, that is, if 


X1,X2,...,X, are independent RVs with X; ~ P();), i= 1,2,...,n, the conditional 
distribution of X,X2,...,Xn, given ye Xi =f, is multinomial with parameters f, 


LS ea Ye 


. Let X;,X2 be independent RVs with X; ~ b(n, 5), i = 1,2. What is the PMF of 


X, —X.+n2? 


. A box contains N identical balls numbered | through N. Of these balls, 7 are drawn at 


atime. Let X;,X>,...,X, denote the numbers on the 7 balls drawn. Let S, = S>y_, Xi. 
Find var(S;,). 


. From a box containing N identical balls marked 1 through N, M balls are drawn one 


after another without replacement. Let X; denote the number on the ith ball drawn, 
i=1,2,...,.M,1<M<N. Let Y = max(X),X>,...,Xy). Find the DF and the PMF 
of Y. Also find the conditional distribution of X1,X2,...,Xiv, given Y = y. Find EY 
and var(Y). 

Let f(x;r,p), x =0,1,2,..., denote the PMF of an NB(r;p) RV. Show that the terms 
f (x;r,p) first increase monotonically and then decrease monotonically. When is the 
greatest value assumed? 


. Show that the terms 


k 
PAX =k =eS, k=0,1,2,..., 


of the Poisson PMF reach their maxima when k is the largest integer < \ and at 
(A — 1) and J if ) is an integer. 


n\ n—k _,»* 
(i) (1—py"" es 


as n —> oo and p — 0, so that np = X remains constant. 
[Hint: Use Stirling’s approximation, namely, n! © V2mn"t!/2e—" as n> 00.] 


. Show that 


. A biased coin is tossed indefinitely. Let p (0 < p < 1) be the probability of success 


(heads). Let Y; denote the length of the first run, and Y2, the length of the second 
run. Find the PMFs of Y; and Y> and show that EY; = q/p+p/q, EY2 = 2. If Y, 
denotes the length of the nth run, n > 1, what is the PMF of Y,,? Find EY,,. 


Show that 
() CONCEP) Cero 


as N - oo. 
Show that 


es k 
(rr ‘pr — pt eX 


as p — | and r > oo in such a way that r(1 — p) = \ remains fixed. 
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12. 


13. 


14. 
15. 


16. 


17. 


18. 


19. 
20. 


Let X and Y be independent geometric RVs. Show that min (X,Y) and X — Y are 
independent. 


Let X and Y be independent RVs with PMFs P{X = k} = px, PLY =kk =a, 
k=0,1,2,...; where pp,g: > Oand 5) ape = 3,9 Ge = 1 Let 
t 
P{IX=k|X+Y=H = (Jef ay O<k<t. 


Then a, = a for all t, and 


_ e ap)s eee 
rr ae 

where 3 = a/(1—a), and 0 > 0 is arbitrary. (Chatterji [13]) 
Generalize the result of Example 10 to the case of k urns, k > 3. 

Let (X),X2,...,X—1) have a multinomial distribution with parameters n, p1,p2,..., 
Pr-1. Write 

“ (X;—np;)? 
Y = Ll l 

where p, = 1 — py; —- ++ — pg_1, and X, =n—X, —---— Xy_}. Find EY and var(Y). 
Let X,, Xz be iid RVs with common DF F, having positive mass at 0,1,2,.... Also, 


let U = max(X),X2) and V = X; — X2. Then 


P{U =j, V=0} =P{U =j}P{V =0} 


for all j if and only if F is a geometric distribution. (Srivastava [109]) 
Let X and Y be mutually independent RVs, taking nonnegative integer values. Then 


P{X <n} —P{X+Y <n} =aP{X+Y=n} 


holds for n = 0,1,2,... and some a > 0 if and only if 


Ilt+a\lt+a 


1 n 
Pir =n}= = Vs ASU AD. 


[Hint: Use Problem 3.3.8.] (Puri [83]] 
Let X),X2,... be a sequence of independent b(1,p) RVs with 0 < p < 1. Also, let 
Zy = Ye where N is a P(\) RV which is independent of the X;’s. Show that 
Zy and N — Zy are independent. 

Prove Theorems 5, 7, 8, and 11. 

In Example 2 show that 


(a) 


P(Xa) =k) =pq* *(1+q), k=1,2,.... 
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(b) 
P 
P(X) —Xa) =k) = (i+q) fork =0 
2 k 
208 fork = 1,2,.... 
(1+q) 


5.3. SOME CONTINUOUS DISTRIBUTIONS 


In this section we study some most frequently used absolutely continuous distributions and 
describe their important properties. Before we introduce specific distributions it should 
be remarked that associated with each PDF f there is an index or a parameter 0 (may be 
multidimensional) which takes values in an index set O. For any particular choice of 9 € O 
we obtain a specific PDF f, from the family of PDFs {fg,0 € O}. 

Let X be an RV with PDF f(x), where 6 is a real-valued parameter. We say that @ is 
a location parameter and {fg} is a location family if X — 0 has PDF f(x) which does not 
depend on 6. The parameter 0 is said to be a scale parameter and {fg} is a scale family of 
PDFs if X/0 has PDF f (x) which is free of 0. If 9 = (u,o) is two-dimensional, we say that 
6 is a location-scale parameter if the PDF of (X — ju)/o is free of js and o. In that case 
{fo} is known as a location-scale family. 

It is easily seen that 6 is a location parameter if and only if fo(x) = f(x— 9), a 
scale parameter if and only fo(x) = (1/6)f(x), and a location-scale parameter if fo(x) = 
(1/o)f((x— )/o), o > 0 for some PDF f. The density f is called the standard PDF for 
the family {fo,0 € O}. 

A location parameter simply relocates or shifts the graph of PDF f without changing 
its shape. A scale parameter stretches (if 6 > 1) or contracts (if 6 < 1) the graph of f. 
A location-scale parameter, on the other hand, stretches or contracts the graph of f with 
the scale parameter and then shifts the graph to locate at ju. (see Fig. 1.) 

Some PDFs also have a shape parameter. Changing its value alters the shape of the 
graph. For the Poisson distribution \ is a shape parameter. 

For the following PDF 


a 1 x—U al 
Fes .a)= ree (“S#) exp m/B}, 42> 


and = 0 otherwise, ju is a location, 3, a scale, and a, a shape parameter. The standard 
density for this location-scale family is 


and = 0 otherwise. For the standard PDF f, a is a shape parameter. 
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(a) 


1/2 


1/3 


Fig. 1 (a) Exponential location family; (b) exponential scale family; (c) normal location-scale 
family; and (d) shape parameter family fo (x) = 0x0 — 1. 
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(d) 
54 


Fig. 1 (continued). 
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5.3.1 Uniform Distribution (Rectangular Distribution) 


Definition 1. An RV X is said to have a uniform distribution on the interval [a,b], 
—oo <a<b< oo if its PDF is given by 


1 


» aS<xcb, 
f(x)=4 b-a (1) 
0, otherwise. 
We will write X ~ U[a,b] if X has a uniform distribution on [a,b]. 
The end point a or b or both may be excluded. Clearly, 
[ feyac=t, 
so that (1) indeed defines a PDF. The DF of X is given by 
0, x <a, 
FQa)=( 2", a<x<b, (2) 
b—a 
I, b<x; 
a+b k pet} _ ght! ; ; 
EX = , EX* = ——____., k>0 teger; 3 
5 cela) is an integer (3) 
(b—a)’ 
Xi — 7 4 
var(X) 7 (4) 
1 
M(t) = fo gf i= 0, 5 
O= Goa te (5) 


Example 1. Let X have PDF given by 


Ne, O<x<co, ADO, 
f(x) = ; 
0, otherwise. 


Then 


F(x) 0 x <0, 
x)= 
l-e-*, x>0. 


Let Y = F(X) = 1—e~*. The PDF of ¥ is given by 


pet st yA pzye1, 


=I O<y<l. 
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Let us define fy(y) = 1 at y= 1. Then we see that Y has density function 


1, O<y<l, 
fry) = fi otherwise, 


which is the U[0,1] distribution. That this is not a mere coincidence is shown in the 
following theorem. 


Theorem 1 (Probability Integral Transformation). Let X be an RV with a continuous 
DF F. Then F(X) has the uniform distribution on [0, 1]. 


Proof. The proof is left as an exercise. 


The reader is asked to consider what happens in the case where F is the DF of a discrete 
RV. In the converse direction the following result holds. 


Theorem 2. Let F be any df, and let X be a U[0, 1] RV. Then there exists a function A such 
that h(X) has DF F, that is, 


P{h(X) <x} = F(x) for all x € (—oo, 00). (6) 
Proof. If F is the DF of a discrete RV Y, let 
P{Y = yx} = pr, KSA Deen 
Define h as follows: 


Y1 if0<x<pi, 
h(x) = <2 ifpi1 <x<pitpr, 


Then 


P{h(X) =y1f} =P{O<X<pis=pi, 


P{h(X) = y2} = P{p. <X <pitp2} =pr, 
and, in general, 
P{h(X) = ye} = Pr, |e ee 


Thus /(X) is a discrete RV with DF F. 
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If F is continuous and strictly increasing F—' is well defined, and we take h(X) = 
F~!(X). We have 


P{h(X) <x} = P{F7'(X) <x} 


= P{X < F(x)} 
= F(x), 
as asserted. 
In general, define 
F~'(y) =inf{x: F(x) 2 y}, (7) 
and let h(X) = F~!(X). Then we have 
{F'(y) Sx} = {y S F(a)}- (8) 


Indeed, F~'(y) < x implies, that, for every ¢ > 0, y < F(x +e). Since e > 0 is arbitrary 
and F is continuous on the right, we let ¢ + 0 and conclude that y < F(x). Since y < F(x) 
implies F~!(y) < x by definition (7), it follows that (8) holds generally. Thus 

P{F'(X) Sx} = P{X < F(x)} = FQ). 


Theorem 2 is quite useful in generating samples with the help of the uniform distribu- 
tion. 


Example 2. Let F be the DF defined by 


Fa) = { x<0 


l-—e%*, x>0. 
Then the inverse to y= 1 — e~*, x > 0, is x = —log(1—y),0<y< 1. Thus 
h(y) = —log(1—y), 
and — log(1 — X) has the required distribution, where X is a U[0, 1] RV. 


Theorem 3. Let X be an RV defined on [0, 1]. If P{x < X < y} depends only on y —x for 
allO<x<y< 1, then X is U[0, 1). 


Proof. Let P{x<X<y}=f(y—x) thenf(x+y) =P{O<X<x+y}=P{0<X<x}+ 
P{x <X <x+y}=f(x)+f/(y). Note that f is continuous from the right. We have 


F(x) =F (x) +f(0), 


so that 


f(0) =0. 
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We will show that f (x) = cx for some constant c. It suffices to prove the result for positive x. 
Let m be an integer then 


f (mx) =f (x) +++ +(x) = mf (x). 


Eh) Ga) 


t (2) = p(n) = “40, 


m 


Letting x = n/m, we get 


so that 


for positive integers n and m. Letting f(1 


rg 


) =c, we have proved that 
(x) = cx 
for rational numbers x. 

To complete the proof we consider the case where x is a positive irrational number. 
Then we can find a decreasing sequence of positive rationals x;,x2,... such that x, — x. 
Since f is right continuous, 


F(x) = lim f (Xn) = lim exp = cx. 


Now, forO<x <1, 


F(x) = P{X <0}+P{0<X <x} 
= F(0)+P{0<X <x} 
=f (x) 


= Cx, O<x<l. 


Since F(1) = 1, we must have c = 1, so that 


This completes the proof. 


5.3.2 Gamma Distribution 


The integral 
Play= fo ae teax 0) 
0 


converges or diverges according as a > 0 or < 0. For a > 0 the integral in (9) is called the 
gamma function. In particular, if a = 1, P(1) = 1. If a > 1, integration by parts yields 


T(a)= (a=1) fx ede (a= 1)P(0-1). (10) 
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If a = nis a positive integer, then 
T(n) = (n—1)!. (11) 


Also writing x = y*/2 inT (5) we see that 


l a coe 
P(-)=—]f e%?ay. 
(3) al 


Now consider the integral J = ii ey / dy. We have 


P= [ [exo(= aE) ay, 


and changing to polar coordinates we get 


20 
p= | f- rexp(—— * ar db = 2n. 
It follows that I (5) = /7. 


Let us write x = y/, 3 > 0, in the integral in (9). Then 


oo La-—l 
T(a)= | ee Pay, (12) 
0 B 


so that 


co 1 ; 
j Taye el. (13) 


Since the integrand in (13) is positive for y > 0, it follows that the function 


i 1 
——_y*le-/B_ 0 <y <0, 
f(y) = § T(a)p (14) 
0, y<0 


defines a PDF fora > 0, 3 > 0. 


Definition 2. An RV X with PDF defined by (14) is said to have a gamma distribution 
with parameters a and 3. We will write X ~ G(a, (3). 


Figure 2 gives graphs of some gamma PDFs. 
The DF of a G(a, 3) RV is given by 
0, x <0, 


F(x) = ! [ -1,-9/8 
— ote XP dy, x>0. 
T(a)B% Jo 7 . 


(15) 
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(a) 


pa05 


Fig. 2 Gamma density functions. 
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0.16 - 
0.12 + 
0.08 + a=8 
p=2 
0.04 + 
0 i l 1 i l 1 1. 
0 5 10 15 20 25 30 35 


Fig. 2 (continued). 


The MGF of X is easily computed. We have 


1 co 
_ x(t—-1/8),a—-1 
M(t) = Ta)a= | e xo" dx 
_ 1 a ae sim 1 
-(ca) fer <3 
= (1—6t)-° t< (16) 


B 
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It follows that 


EX =M'())| 9 = a8, iy) 
EX’? =M"(t)|<0 = a(a+1)6, (18) 

so that 
var(X) = af’. (19) 


Indeed, we can compute the moment of order n such that a+ n > 0 directly from the 
density. We have 


1 - : 
EX" = et/Byatn-1 dx 
P(a)8% Jo 


— gal (tn) 
=? Ta) 


= B"(a+n—1)(a+n—2)---a. (20) 


The special case when a = | leads to the exponential distribution with parameter [. 
The PDF of an exponentially distributed RV is therefore 


~1,-x/8 
roy= {os oe (21) 


0, otherwise. 


Note that we can speak of the exponential distribution on (—oo,0). The PDF of such an 
RV is 


_ Bele, x<0, 
f(x) = ‘6 250, (22) 


Clearly, if X ~ G(1, 8), we have 


EX" = n!p" (23) 
EX=6 and var(X) = 6”, (24) 
M(t)=(1-8t)"! fort < Bo. (25) 


Another special case of importance is when a = n/2,n > 0 (an integer), and 6 = 2. 


Definition 3. An RV X is said to have a chi-square distribution (x?-distribution) with n 
degrees of freedom where n is a positive integer if its PDF is given by 


1 
—_g-4/2y0/2-1 Qe x <0, 
fas. tape" (26) 
0, x <0. 
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We will write X ~ y?(n) for a x” RV with n degrees of freedom (d.f.). [Note the difference 
in the abbreviations of distribution function (DF) and degrees of freedom (d.f.).] 


If X ~ y?(n), then 


EX =n, var(X) = 2n, (27) 
2‘T [(n/2) +k] 
kK 
x= Td)” (28) 
and 
M(t)=(1—2t)-"? — fort< . (29) 


Theorem 4. Let X),X>,...,X;,, be independent RVs such that X} ~ G(aq;,),j=1,2,...,n. 
Then S, = >>y_, Xz isa G(j=1 aj, 3) RV. 


Corollary 1. Let X),X2,...,X;, be tid RVs, each with an exponential distribution with 
parameter 3. Then S,, is a G(n, 3) RV. 


Corollary 2. If X;,X2,...,X, are independent RVs such that X; ~ aU = 1,2,...,n, 
then S, is a x?()-7_, 7) RV. 


Theorem 5. Let X ~ U(0,1). Then ¥Y = —2logX is y7(2). 


Corollary. Let X1,X2,...,X, be iid RVs with common distribution U(0,1). Then 
—2>%"_, log X; = 2log(1/]Tj_, Xi) is x?(2n). 


Theorem 6. Let X ~ G(a;,) and Y ~ G(az, 3) be independent RVs. Then X + Y and 
X/Y are independent. 


Corollary. Let X ~ G(a;,3) and Y ~ G(az,) be independent RVs. Then X + Y and 
X/(X +Y) are independent. 


The converse of Theorem 6 is also true. The result is due to Lukacs [68], and we state 
it without proof. 


Theorem 7. Let X and Y be two nondegenerate RVs that take only positive values. Sup- 
pose that U = X+ Y and V = X/Y are independent. Then X and Y have gamma distribution 
with the same parameter (3. 
Theorem 8. Let X ~ G(1,(). Then the RV X has “no memory,” that is, 

P{X >r+s|X >s}=P{X >r}, (30) 


for any two positive real numbers r and s. 
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Proof. The proof is left as an exercise. 

The converse of Theorem 8 is also true in the following sense. 
Theorem 9. Let F be a DF such that F(x) = 0 if x < 0, F(x) < Lifx > 0, and 


1—F(x+y) 
1—F(y) 


Then there exists a constant 3 > 0 such that 


= 1-— F(x) for all x,y > 0. (31) 


1—F(x)=e*, = x > 0. (32) 
Proof. Equation (31) is equivalent to 


g(xt+y) = g(x) +a(y) 


if we write g(x) = log{1 — F(x)}. From the proof of Theorem 3 it is clear that the only 
right continuous solution is g(x) = cx. Hence F(x) = 1 — e%, x > 0. Since F(x) > 1 as 
x — 00, it follows that c < 0 and the proof is complete. 


Theorem 10. Let X,,X2,...,X, be iid RVs. Then X; ~ G(1,n@),i=1,2,...,n, if and only 
if X(1) is G(1, 8). 


Note that if X;,X2,...,X,, are independent with X; ~ G(1,@;),i=1,2,...,n, then X(1) 
isa G (11/214; ') RV. 


The following result describes the relationship between exponential and Poisson RVs. 


Theorem 11. Let X;,X2,... be a sequence of iid RVs having common exponential density 
with parameter 6 > 0. Let S, = yi X, be the nth partial sum, n = 1,2,..., and suppose 
that t > 0. If Y = number of S,, € [0,], then Y is a P(t/3) RV. 


Proof. We have 


1 co 
P{Y =0} = P{S, > 1} = 5/ o/B dy =o t/8 


t 


so that the assertion holds for Y = 0. Let n be a positive integer. Since the X;’s are 
nonnegative, S, is nondecreasing, and 


P{Y =n} = P{S, <t, Sn4i > th. (33) 
Now 


P{S, < th = P{Sy <t, Snoi > t}+P{Sry1 < th. (34) 
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It follows that 


P{Y =n} = P{S, < th— P{Sn41 < th, (35) 
and, since S, ~ G(n, 3), we have 
t 
; 1 : 
Ply — —1 —x/B | x —x/B g 
{ e x Tati e Ix 
she 
_ Brn! 2 


as asserted. 


Theorem 12. If X and Y are independent exponential RVs with parameter (, then Z = 
X/(X +Y) has a U(0, 1) distribution. 


Note that, in view of Theorem 7, Theorem 12 characterizes the exponential distribution 
in the following sense. Let X and Y be independent RVs that are nondegenerate and take 
only positive values. Suppose that X + Y and X/Y are independent. If X/(X + Y) is U(0, 1), 
X and Y both have the exponential distribution with parameter 3. This follows since, by 
Theorem 7, X and Y must have the gamma distribution with parameter 3. Thus X/(X + Y) 
must have (see Theorem 14) the PDF 


Tay +a) yal 


[=”)@—", 0<x<l, 
Fale) 


f(x) = 


and this is the uniform density on (0,1) if and only if a; = a2 = 1. Thus X and Y both 
have the G(1, 8) distribution. 


Theorem 13. Let X be a P(X) RV. Then 
tf? 
PIX< Kl=— i e xk dx (36) 
K! Jy 
expresses the DF of X in terms of an incomplete gamma function. 


Proof. 


A P(X SK} = tie *W- — Ne~*} 
j= o/ 


—)\Ke —A 


and it follows that 


as asserted. 
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An alternative way of writing (36) is the following: 
P{X <K}=P{Y>2)}, 


where X ~ P(X) and Y ~ x?(2K +2). 


5.3.3. Beta Distribution 
The integral 


t= 
B(a, 8) =| 4° =a) ae (37) 


converges for a > 0 and 6 > 0 and is called a beta function. For a < 0 or 8 < 0 the integral 
in (37) diverges. It is easy to see that for a > 0 and 6 > 0 


B(a, 8) = B(S,a), (38) 
B(a, 8) = ea +x)~? Pde, (39) 
0+ 
and 
_ T(a)P(8) 
B(a,8) = Ta4 By (40) 
It follows that 
xeh(1—x)P 0) <x< 1 
f(x) = Bla,B) ’ (41) 
0, otherwise, 


defines a pdf. 


Definition 4. An RV X with PDF given by (41) is said to have a beta distribution with 
parameters a and 8, a > 0 and 6 > 0. We will write X ~ B(a, 3) for a beta variable with 
density (41). 


Figure 3 gives graphs of some beta PDFs. 
The DF of a B(a, 3) RV is given by 


0, x <0, 
F(x) = 4 [B(a,8)|-! | ea, OS ex I, (42) 


1, x1, 
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4 


Fig. 3 Beta density functions. 


If n is a positive number, then 


1 - 
EX" = xito-l(y _ y)B—! dy 
B(a, 8) 0 ( ) 


_Bnto,f) _P(nta)l(a+8) 
Bla.B) — F(a)Fta+) 


using (40). In particular, 


EX = a 
and 
a 
ven CEEECEEES NE 
For the MGF of X ~ B(a, 3), we have 
M(t) = : [ Oe (Sx “de, 
Bla, B) Jo 
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(43) 


(44) 


(45) 


(46) 
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Since moments of all order exist, and E|X }/ < 1 for all 7, we have 


= — EX! 
DD I! 
j=0 


Co 


= # (a+ j'(a+ 8) 
22 TG+) T(atBt+/r(a) 


(47) 


Remark I. Note that in the special case where a = 3 = | we get the uniform distribution 
on (0,1). 


Remark 2. If X is a beta RV with parameters a and 3, then 1 — X is a beta variate with 
parameters (6 and a. In particular, X is B(a,q) if and only if 1 — X is B(a,a). A special 
case is the uniform distribution on (0, 1). If X and 1 — X have the same distribution, it does 
not follow that X has to be B(a,a). All this entails is that the PDF satisfies 


f(x) =f(1—x), 0<x<l. 


Take 


1 
B(a, 8) + B(B, a) 


Example 3. Let X be distributed with PDF 


f= {2 (1—x), eas. 


0, otherwise. 


f(x) = eae eae, Ox x= 1: 


Then X ~ B(3,2) and 


ri )P a ue +4) (n+4)(n+3)’ 
12 6 

EX= 55) var(X) = mg 5s 
=F G+2)41 

M(t) = oF G44) 2 


ra 


=> TSF 
and 


1 
P{0.2<X <0.5}= a/ (=a ae 
12 Jor 
= 0.023. 
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Theorem 14. Let X and Y be independent G(a), 3) and G(az, 3), respectively, RVs. Then 


X/(X +Y) is a B(a, a2) RV. 


Let X1,X2,...,Xn be iid RVs with the uniform distribution on [0,1]. Let X(,) be the 


kth-order statistic. 


Theorem 15. The RV X(;) has a beta distribution with parameters a = k and 8 =n—k+1. 


Proof. Let X be the number of X;’s that lie in [0,1]. Then X is b(n,t). We have 


P(X) <1} = P{X =k} 


Also 


On integration, we get 


Pixw s=n(t 1) fe ta- yk ay, 


Remark 3. Note that we have shown that if X is b(n,p), then 


as asserted. 


n—1 . =1 n—k 
1—P{X <k}=n k-1 a =a) de, 
7 0 


which expresses the DF of X in terms of the DF of a B(k,n —k+ 1) RV. 


(48) 


Theorem 16. Let X),X2,...,X, be independent RVs. Then X),X2,...,X, are iid B(a, 1) 


RVs if and only if X(,) ~ B(an, 1). 


5.3.4 Cauchy Distribution 


Definition 5. An RV X is said to have a Cauchy distribution with parameters ys and @ if 


its PDF is given by 


f(x)=-s co<x<oo, p>O. 


(49) 
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Fig. 4 Cauchy density function. 


We will write X ~ C(u,0) for a Cauchy RV with density (49). 


Figure 4 gives graph of a Cauchy PDF. 
We first check that (49) in fact defines a PDF. Substituting y = (x — 0)/ju, we get 


i f (x) dx = | ay = = (tan! y)§ = 1. 


wt J_,, 1+y 


The DF of a C(1,0) RV is given by 


1 1 
F(x) = 5 t tans, —0o0<x< 00. (50) 


Theorem 17. Let X be a Cauchy RV with parameters jz and 6. The moments of order < 1 
exist, but the moments of order > 1 do not exist for the RV X. 


Proof. It suffices to consider the PDF 


1 1 
fe) ==: 


a 2 a OL 1 
E|x|° = = wT 5 ax, 
T Jo +x 
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and letting z = 1/(1 +x”) in the integral, we get 
| 1 
EIx|* = =| ga) 2-174 _ yet) /2}-1 ge 
T JO 


which converges for a < | and diverges for a > 1. This completes the proof of the theorem. 


It follows from Theorem 17 that the MGF of a Cauchy RV does not exist. This creates 
some manipulative problems. We note, however, that the CF of X ~ C(jz,0) is given by 


b(t) =e HI", (51) 


Theorem 18. Let X ~ C(,u1,0,) and Y ~ C(j12,02) be independent RVs. Then X + Y is a 
C(u1 + po, 01 +62) RV. 


Proof. For notational convenience we will prove the result in the special case where jy) = 
[2 = | and 6; = 6, = 0, that is, where X and Y have the common PDF 


1 1 
ae eo —00 <x< oO. 
The proof in the general case follows along the same lines. If Z = X + Y, the PDF of Z is 
given by 
1 f/* 1 1 
= ; dx. 
falz) 1 lm 1+(z—x)? . 
Now 
1 
(1 +27)[1 + (z—x)?] 
_ 1 2. 22” — 2zx z 
~ (2244) [1422 © 1422 ° 14 (z—x)2 ° 14 (z—x)? 
so that 
fz(z) = : : zlog +2tan-!x+2tan7!(x—z) - 
mW (22 +4) I ie=s) _ 
12 wee 
= 5 De co <Z<00. 


It follows that, if X and Y are iid C(1,0) RVs, then X + Y is a C(2,0) RV. We note that the 
result follows effortlessly from (51). 


Corollary. Let X;,X2,...,X, be independent Cauchy RVs, X; ~ C(x, O¢), kK = 1,2,...,7. 
Then 5, = >>) Xe is a CO], pe, 5) Ok) RV. 


In particular, if X|,X2,...,X, are iid C(1,0) RVs, n~'S, is also a C(1,0) RV. This is 
a remarkable result, the importance of which will become clear in Chapter 7. Actually, 
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this property uniquely characterizes the Cauchy distribution. If F is a nondegenerate DF 
with the property that n~'S,, also has DF F, then F must be a Cauchy distribution (see 
Thompson [113, p. 112]). 

The proof of the following result is simple. 


Theorem 19. Let X be C(ju,0). Then \/X, where X is a constant, is a C(|A|/,0) RV. 
Corollary. X is C(1,0) if and only if 1/X is C(1,0). 
We emphasize that if X and 1/X have the same PDF on (—o0, 00), it does not follow* 

that X is C(1,0), for let X be an RV with PDF 

1 : 

f= if |x| <1, 
4 
=; if |x| > 1. 

4x lal 
Then X and 1 /X have the same PDF, as can be easily checked. 
Theorem 20. Let X be a U(—7/2,7/2) RV. Then Y = tanX is a Cauchy RV. 


Many important properties of the Cauchy distribution can be derived from this result 
(see Pitman and Williams [80]). 


5.3.5 Normal Distribution (the Gaussian Law) 
One of the most important distributions in the study of probability and mathematical 


statistics is the normal distribution, which we will examine presently. 


Definition 6. An RV X is said to have a standard normal distribution if its PDF is given by 
p(x) = ae -—0 <x< OO. (52) 
V2 ; 
We first check that f defines a PDF. Let 


I -|/ ent /2 dx. 


* Menon [73] has shown that we need the condition that both X and 1/X be stable to conclude that 
X is Cauchy. A nondegenerate distribution function F is said to be stable if, for two iid RVs X1, X2 
with common DF F, and given constants a;,a. > 0, we can find a > 0 and (a), a2) such that 
the RV 


X3 =a '(a1X1 + aX? — B) 


again has the same distribution F. Examples are the Cauchy (see the corollary to Theorem 18) 
and normal (discussed in Section 5.3.5) distributions. 
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Then 


2 " 
Vee" Fee he -wo<x< oO, 


/ e I+! dy = 2e, 
and it follows that J exists. We have 
| y2—-¥2 dy 
0 


1 21/2 
2 


= V2mq. 


i= 
=f— 


Thus [°° _ v(x) dx = 1, as required. 
Let us write Y = 0X + js, where o > 0. Then the PDF of Y is given by 


1 (yp 
vo) =4e(*) 
oO oO 
1 2 2 
= —[0-H)?/207] ‘ 
= e , —o <y<oo;a>0, -wo< p<. 53 
= j Hl (53) 


Definition 7. An RV X is said to have a normal distribution with parameters js (—oo < 
jt < co) and o(> 0) if its PDF is given by (53). 


If X is anormally distributed RV with parameters ju and o, we will write X ~ N(,07). 


In this notation y defined by (53) is the PDF of an N(0,1) RV. The DF of an N(0,1) RV 
will be denoted by ®(x), where 


O(x) = =| en /? dy. (54) 


Clearly, if X ~ N(y,07), then Z = (X — 2) /o ~ N(0, 1). Z is called a standard normal RV. 
For the MGF of an N (1,07) RV, we have 


1 ed —x  (to*+p) pw 
Muy= J/2no I. xp ie ee 20? \ o 


1 as —(x—p—o7t)? or 
= [of 92 + pit + 5 dx 


242 
= exp G =) ; (55) 


for all real values of t. Moments of all order exist and may be computed from the MGF. 
Thus 


EX =M'(1)|,-0 = (u+.0t)M(|ino = H (56) 
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and 
EX? = M"(1)| <0 = [M(t)o? + (u+-07tM()].=0 
=o* +p". (57) 
Thus 
var(X) = 0”. (58) 


Clearly, the central moments of odd order are all 0. The central moments of even order 
are as follows: 


E{X—p)}*" = ene —* 120" ay (n is a positive integer) 


Tb 


2n 
= o n+1/2 i 
= me T (n+ 5) 
= [(2n—1)(2n—3)-+-3-1]o". (59) 


As for the absolute moment of order a, for a standard normal RV Z we have 
E|z|* = qt [teat 
V2 


[(a-+1)/2)]-1,-9/2 g 
a) af ? 
_ Tf(a+1)/2]2°/? 


Va 


As remarked earlier, the normal distribution is one of the most important distributions 
in probability and statistics, and for this reason the standard normal distribution is available 
in tabular form. Table ST2 at the end of the book gives the probability P{Z > z} for various 
values of z(> 0) in the tail of an N(0, 1) RV. In this book we will write zq for the value of 
Z that satisfies a= P{Z > za},0<a<l. 


(60) 


Example 4. By Chebychev’s inequality, if E|X|? < oo, EX = p, and var(X) = 07, then 
P{|X—p| > Ko} < ra 


For K = 2, we get P{|X — | > Ko} < 0.25, and for K = 3, we have P{|X— | > Ko}<t} 
If X is, in particular, N(j 7), then 


P{|X— p| > Ko} = P{|Z| > K}, 
where Z is (0,1). From Table ST2. 


P{|Z|>1}=0.318,  P{|Z|>2}=0.046, and ~— P{|Z| > 3} = 0.002. 
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Thus practically all the distribution is concentrated within three standard deviations of 
the mean. 


Example 5. Let X ~ N(3,4). Then 


223 4x>3 
2 a) 
= P{Z < 1} —P{Z < -0.5} 

= 0.841 — P{Z > 0.5} 
= 0.0841 — 0.309 = 0.532. 


PI2<X<5}=P{ <b ar (622< 1) 


Theorem 21. (Feller [25, p. 175]). Let Z be a standard normal RV. Then 


1 2 
P{Z>x}x age as x + Oo. 61 
(Z>x}~ 61) 
More precisely, for every x > 0 
1 2 1 1 1 2 
—x /2 —x°/2 
——e —-——+]<PiZ>x} < ——e F 62 
V2 Ee =) { } xv 27 see) 
Proof. We have 
1 oO 2 3 1 2 1 1 
oe —(1/2)y 1-3) dy = —=e7* /? (=-=) 63 
e e 
V = ( ys ? V2r x ve 
and 
1 me -y/2 ( 1 ) 1 2/21 
—= e° 1+— |} dy=—e = 64 
= | y? 4 V 20 x (04) 


as can be checked on differentiation. Approximation (61) follows immediately. 


Theorem 22. Let X;,X>,...,X, be independent RVs with X, ~ N(y4,07), kK =1,2,...,n. 
Then S, = )>,_, Xe is an N(y_, Mk, 91 OF) RV. 


Corollary 3. If X,X2,...,Xn are iid N(1,07) RVs, then S,, is an N(npu,no?) RV and 
n—'S, is an N(,07/n) RV. 


Corollary 4. If X;,X2,...,X, are iid N(0, 1) RVs, then n—'/25, is also an N(0, 1) RV. 


We remark that if X,,X2,...,X, are lid RVs with EX = 0, EX? = 1 such that nS. also 
has the same distribution for each n = 1,2,..., that distribution can only be N(0, 1). This 
characterization of the normal distribution will become clear when we study the central 
limit theorem in Chapter 7. 
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Theorem 23. Let X and Y be independent RVs. Then X + Y is normally distributed if and 
only if X and Y are both normal. 


If X and Y are independent normal RVs, X + Y is normal by Theorem 22. The converse 
is due to Cramér [16] and will not be proved here. 


Theorem 24. Let X and Y be independent RVs with common N(0, 1) distribution. Then 
X+Y and X — Y are independent. 


The converse is due to Bernstein [4] and is stated here without proof. 


Theorem 25. If X and Y are independent RVs with the same distribution and if Z; = X + Y 
and Z, = X — Y are independent, all RVs X, Y, Z,, and Z, are normally distributed. 


The following result generalizes Theorem 24. 
Theorem 26. If X,,X2,...,X, are independent normal RVs and )~_, a;bjvar(X;) = 0, 
then Lj = )>)_,a:X; and Ly) = )~7_,b;X; are independent. Here aj,a,...,4, and 


b,,b2,...,bn are fixed (nonzero) real numbers. 


Proof. Let var(X;) = 07, and assume without loss of generality that EX; = 0, i = 
1,2,...,n. For any real numbers a, 3, and t 


Ee +82)! — Fexp {Dole + aan 


1 


=Too{5 (aa; + Bb;)°o , 
20 Saas BES vet (ame Sabet a 

= ex since aj0jO; = 

2 1 — i 

n PoP 282 
= [Lex { ator} : ee { toi} 
= ][£°* . lar 

1 I 
= Eexp @xg -Eexp (12S) 

1 


I 
= Ee! Fe? ® | 
Thus we have shown that 


M(at, Bt) = M(at,0)M(0, Gr) for all a, 8,t. 


It follows that L; and Ly are independent. 
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Corollary. If X,, Xz are independent N(j1;,07) and N(pi2,07) RVs, then X,; — X2 and 
X, +X. are independent. (This gives Theorem 24.) 


Darmois [20] and Skitovitch [106] provided the converse of Theorem 26, which we 
state without proof. 


Theorem 27. If X),X2,...,X, are independent RVs, a),d2,...,dn, b1,b2,...,bn are real 
numbers none of which equals 0, and if the linear forms 


are independent, then all the RVs are normally distributed. 


Corollary. If X and Y are independent RVs such that X + Y and X — Y are independent, 
X, Y,X+Y, and X — Y are all normal. 


Yet another result of this type is the following theorem. 


Theorem 28. Let X),X2,...,X, be iid RVs. Then the common distribution is normal if 
and only if 


n n 
Sn= SX and Y,=> (X;—-n'S,)?° 
k=1 i=] 


are independent. 


In Chapter 6 we will prove the necessity part of this result, which is basic to the theory 
of t-tests in statistics (Chapter 10; see also Example 4.4.6). The sufficiency part was proved 
by Lukacs [67], and we will not prove it here. 

Theorem 29, X ~ N(0,1) > X* ~ y7(1). 
Proof. See Example 2.5.7 for the proof. 
Corollary 1. If X ~ N(ju,07), the RV Z* = (X — p)?/o7 is x7(1). 


Corollary 2. If X,,X2,...,X, are independent RVs and X,; ~ N(j1z,07), k = 1,2,...,n, 
then 7-1 (Xk — He)? /o7 is x?(n). 


Theorem 30. Let X and Y be iid N(0,07) RVs. Then X/Y is (1,0). 
Proof. For the proof see Example 2.5.7. 


We remark that the converse of this result does not hold; that is, if Z = X/Y is the 
quotient of two iid RVs and Z has a C(1,0) distribution, it does not follow that X and Y 
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are normal, for take X and Y to be iid with PDF 


V2 1 


fO= Tae -—WO<xX< Ow. 


We leave the reader to verify that Z = X/Y is C(1,0). 


5.3.6 Some Other Continuous Distributions 


Several other distributions which are related to distributions studied earlier also arise in 
practice. We record briefly some of these and their important characteristics. We will use 
these distributions infrequently. We say that X has a lognormal distribution if Y = én X 
has a normal distribution. The PDF of X is then 


i= — exp excel}, x>0, (65) 


xoV/ 20 207 


and f(x) = 0 for x < 0, where —oo < fu < 00, o > 0. In fact for x > 0 


P(X <x) = P(énX < fn x) 


=Py tnx) =p (7H < SAt) 
o o 


where ® is the DF of a N(0,1) RV which leads to (65). It is easily seen that for n > 0 


EX" = exp ( metniat 


2 (66) 
EX = exp (47) , var(X) = exp(2y +207) — exp(2u+07). 

The MGF of X does not exist. 

We say that the RV X has a Pareto distribution with parameters 0 > 0 and a > 0 if its 
PDF is given by 


oo 
f=—ygar 7>9 (67) 


and 0 otherwise. Here 6 is scale parameter and a is a shape parameter. It is easy to check 
that 


oo 
F(x) = P(X <x) =1-——_,, x>0 
W=PKSA=I- Gia >t a 
0 = ad- 
EX = 57,0 > 1, and var(X) = (eDlani? 


for ~ > 2. The MGF of X does not exist since all moments of X do not. 
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Suppose X has a Pareto distribution with parameters @ and a. Writing Y = ¢n(X/0) we 
see that Y has PDF 
ae» 


fry) = (teyat’ 


—o0 <y<ov, (69) 


and DF 
Fy(y)=1-(1+e&)~%,  forally. 


The PDF in (69) is known as a logistic distribution. We introduce location and scale param- 
eters 4 and o by writing Z = w+aY, taking a = | and then the PDF of Z is easily seen 
to be 
_ 1 exp{(c—p)/o} 

o {1 +exp[(<—p)/o]}? 


for all real z. This is the PDF of a logistic RV with location—scale parameters ju and 7. We 
leave the reader to check that 


Fa(2) (70) 


EZ = p, var(Z) = = (71) 


Mz(t) =exp(ut)P(1 —ot)P(1+ot), t< 4. 


Pareto distribution is also related to an exponential distribution. Let X have Pareto PDF of 
the form 


aao® 


fx(s) = x > o (72) 


= xot+l ’ 


and 0 otherwise. A simple transformation leads to PDF (72) from (67). Then it is easily 
seen that Y = (n(X/o) has an exponential distribution with mean 1 /a. Thus some proper- 
ties of exponential distribution which are preserved under monotone transformations can 
be derived for Pareto PDF (72) by using the logarithmic transformation. 

Some other distributions are related to the gamma distribution. Suppose X ~ G(1, 3). 
Let ¥Y = X'/*, a > 0. Then Y has PDF 


fr(y) = (5) yeep I y>0 (73) 


and 0 otherwise. The RV Y is said to have a Weibull distribution. We leave the reader to 
show that 


Fy(y) =1-exp(=F), y>0 
Ey" = BT (1+4), EY =6'/6P(1+4), (74) 


var(Y) = 6?/¢ [r (1+ 2)-1?(1+4)]. 
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The MGF of Y exists only for a > 1 but for a > | it does not have a form useful in 


applications. The special case vw = 2 and § = 6? is known as a Rayleigh distribution. 
Suppose X has a Weibull distribution with PDF (73). Let Y = @n X. Then Y has DF 


een 
Fy(y) = I-ep{—3e"h —o0 <y< oo. 


B 
Setting 6 = (1/a)én 6 and o = 1/a we get 
Fy(y) =1 exp exp pI (75) 
with PDF 
frly) = Lexp{ [29 exp [29h (76) 


for —co <y<coanda > 0. AnRV with PDF (76) is called an extreme value distribution 
with location—scale parameters @ and o. It can be shown that 


EY =6@—v7o, var(Y) = a and 
(77) 
My(t) =e"T(1+or), 
where yy ¥ 0.577216 is the Euler constant. 


The final distribution we consider is also related to a G(1, 8) RV. Let fi be the PDF of 
G(1, @) and fy the PDF 


1 x 
a(x) = =exp{ =], x<0, =0 otherwise. 
fle) = sen (5) 


Clearly f2 is also an exponential PDF defined on (—oo,0). Consider the mixture PDF 


f(x) = 5Fi@)+AQ@)], —co<x<oo. (78) 

Clearly, 
= sd 79 
Fs) = 5 exp { B \. CO<X< 00, ( ) 


and the PDF f defined in (79) is called a Laplace or double exponential pdf. It is convenient 
to introduce a location parameter jz and consider instead the PDF 


Fla) = 5exp4 FF 


B 


where —co < p< co, § > 0. It is easy to see that for RV X with PDF (80) we have 


\ oo <x <00, (80) 


EX = 1, var(X) = 26”, and M(t) = e“[1—(6t)?]~", (81) 


for |t| < 1/6. 
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For completeness let us define a mixture PDF (PMF). Let g(x|@) be a PDF and let h(@) 
be a mixing PDF. Then the PDF 


f= / e(xl6)h(0)d0 (82) 


is called a mixture density function. In case h is a PMF with support set {6),62,...,0x}, 
then (82) reduces to a finite mixture density function 


k 
= Dd 8(+16,)A(6). (83) 


The quantities 4(6;) are called mixing proportions. The PDF (78) is an example with k = 2, 
h(91) = h(O2) = 1/2, g(x|91) =fi(x), and g(x|2) = fa(x). 


PROBLEMS 5.3 


1. Prove Theorem 1. 


2. Let X be an RV with PMF p; = P{X = k} given below. If F is the corresponding 
DF, find the distribution of F(X), in the following cases: 


(a) Pe = @ ‘(1—p)""*,k=0,1,2,...,.m,0<p<1. 


(6) pee A"), k= 01,2, 22.8 ASO, 
3. Let Y; ~ U[0, 1], Y2 ~ U[O, Y1],...,%n ~ U[O, Yn—1]. Show that 


Y,~X, Yo ~ X1X2,..2, Vn ~ X1X2 °° Xn, 


where X,,X>,...,X, are iid U[0, 1] RVs. If U is the number of Y,, Y2,..., Y, in [t, 1], 
where 0 < ¢ < 1, show that U has a Poisson distribution with parameter — logt. 


4. Let X1,Xo,...,X, be iid U/0,1] RVs. Prove by induction or otherwise that S,, = 
yy i_1 Xe has the PDF 


fabs) = [(n- 1)! rye (eso ea 


where e(x) = lifx>0,=0Oifx <0. 
5. (a) Let X be an RV with PMF p; = P(X = x;), 7 = 0,1,2,..., and let F be the DF 
of X. Show that 


1 [oe} 
EF(X)=5 41+) 055 
2 


co 
var F(X )= Lode ~3 1-S> P; ; 
j=0 


co 
where gj+1 = inj Pi 


226 


13. 


14. 


15. 


16. 


17. 


18. 


19. 
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(b) Let p; > 0 forj =0,1,...,N and yp = |. Show that 


(N+2) 
EF) 2 bay 


with equality if and only if pj = 1/(N + 1) for all j. 
(Rohatgi [91]) 


. Prove (a) Theorem 6 and its corollary, and (b) Theorem 10. 
. Let X be a nonnegative RV of the continuous type, and let Y ~ U(0,X). Also, let 


Z = X —Y. Then the RVs Y and Z are independent if and only if X is G(2,1/A) for 
some \ > 0. (Lamperti [59]) 


. Let X and Y be independent RVs with common PDF f(x) = 8~° ax! if0<x < 8, 


and = 0 otherwise; a > 1. Let U = min(X,Y) and V = max(X, Y). Find the joint 
PDF of U and V and the PDF of U+ V. Show that U/V and V are independent. 


. Prove Theorem 14. 
. Prove Theorem 8. 

11. 
12. 


Prove Theorems 19 and 20. 

Let X1,X2,...,X, be independent RVs with X; ~ C(4;, \;), i= 1,2,...,n. Show that 
the RV X = 1/5~_, X;! is also a Cauchy RV with parameters p./(\? + 7) and 
A/(* +p), where 


n n 
Xi Mi 
A= y aT and b= y : 
meee A — NP + ie 


i 


Let X\,X2,...,X, be iid C(1,0) RVs and a; 4 0, b;, i = 1,2,...,n, be any real 
numbers. Find the distribution of ~_, 1/(a:X;+;). 

Suppose that the load of an airplane wing is a random variable X with 
N(1000, 14400) distribution. The maximum load that the wing can withstand is an 
RV Y, which is N(1260, 2500). If X and Y are independent, find the probability that 
the load encountered by the wing is less than its critical load. 

Let X ~ N(0, 1). Find the PDF of Z = 1/X?. If X and Y are iid N(0, 1), deduce that 
U = XY/VX?+Y? is N(0, 1/4). 

In Problem 15 let X and Y be independent normal RVs with zero means. Show 
that U = XY/,/(X?+Y7) is normal. If, in addition, var(X) = var(Y) show that 
V = (X? — Y*)/,/(X?4+ Y2) is also normal. Moreover, U and V are indepen- 
dent. (Shepp [104]) 
Let X1,X2,X3,X4 be independent N(0, 1). Show that Y = XX + X3X, has the PDF 
f(y) = ye?! -00 < y< x. 

Let X ~ N(15, 16). Find (a) P{X < 12}, (b) P{10 < X < 17}, (©) P{10< XxX < 19 
| X < 17} and (d) P{|X — 15| > 0.5}. 

Let X ~ N(—1,9). Find x such that P{X > x} = 0.38. Also find x such that 
P{|X+1| <x} =0.4. 
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20. Let X be an RV such that log(X — a) is N(,07). Show that X has PDF 


21. 


22. 
23. 


24. 


25. 


26. 


27. 


Lf lose a)=wP Ye 
f(x) = ae 203 } a 


0 ifx <a. 


If m,mz are the first two moments of this distribution and a3 = pu3/ ual > is the 
coefficient of skewness, show that a, 4, 0 are given by 


mz — m7 


os o =log(1+7°), 


a=m — 


and 


pe = log(m, — a) — 50 
where 7 is the real root of the equation 7° +37 — a3 = 0. 
Let X ~ G(a, @) and let Y ~ U(0,X). 
(a) Find the PDF of Y. 
(b) Find the conditional PDF of X given Y = y. 
(c) Find P(X+Y < 2). 
Let X and Y be iid N(0, 1) RVs. Find the PDF of X/|Y|. Also, find the PDF of |X|/|Y]. 


It is known that X ~ B(a, 3) and P(X < 0.2) = 0.22. If a+ 6 = 26, find a and £. 
[Hint: Use Table ST1.] 


Let X1,X2,...,X, be iid N(j1,07) RVs. Find the distribution of 


—— Dna KX = He 


n 1/2 
(ra) 
Let F),F2,...,F, be n DFs. Show that min[F)(x,),F2(x2),...,Fn(%,)] is an 
n-dimensional DF with marginal DFs F), F2,... Fp. (Kemp [50]) 


Let X ~ NB(1;p) and Y ~ G(1,1/A). Show that X and Y are related by the equation 


P{X <x} = P{Y < [x]} for x > 0, A=loe (>). 


where [x] is the largest integer < x. Equivalently, show that 
PLY € (n,n+ 1]} = Po{X =n}, 


where 0 = 1—e~> (Prochaska [82]). 
Let T be an RV with DF F and write S(t) = 1 — F(t) = P(T > t). The function F is 
called the survival (or reliability) function of X (or DF F). The function A(t) = ae 
is called hazard (or failure-rate) function. For the following PDF find the hazard 
function: 
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(a) Rayleigh: f(t) = (t/a?) exp{—P /(2a’)}, t>0. 
(b) Lognormal: f(t) = 1/(to/27) exp{—(én t— p)*/207}. 
(c) Pareto: f(t) = ad%/1°*!, t > 0, and = 0 otherwise. 
(d) Weibull: f(t) = (a/8)t*~! exp(—t*/8), t > 0. 
(e) Logistic: f(t) = (1/8) exp{—(t — u)/B}[1 + exp{—(t— 1) /B}]-*, —00 < 
t<o. 
28. Consider the PDF 


rv=( 25)" oof gt] 9 


and = 0 otherwise. An RV X with PDF f is said to have an inverse Gaussian 
distribution with parameters jy and A, both positive. Show that 


EX = j1,var(X) = y3/ and 


1/2 
M(t) = Eexp(tX) -on} 2 c (1- ur) | \. 


29. Let f be the PDF of a N(j1,07) RV: 
(a) For what value of c is the function cf”, n > 0, a pdf? 
(b) Let ® be the DF of Z ~ N(0, 1). Find E{Z®(Z)} and E{Z?®(Z)}. 


5.4. BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 


In this section we introduce the bivariate and multivariate normal distributions and inves- 
tigate some of their important properties. We note that bivariate analogs of other PDFs are 
known but they are not always uniquely identified. For example, there are several versions 
of bivariate exponential PDFs so-called because each has exponential marginals. We will 
not encounter any of these bivariate PDFs in this book. 


Definition 1. A two-dimensional RV (X, Y) is said to have a bivariate normal distribution 
if the joint PDF is of the form 


1 , 
f(x,y) = Se a, (1) 
(y) 270 102./1— p? 
—~w<x<0, —w<y<o, 


where a; > 0, a2 > 0, |p| < 1, and Q is the positive definite quadratic form 


1 =mn\* = — =H) 
n= |(S) ao (=) (2) + a). " 


Figure | gives graphs of bivariate normal PDF for selected values of p. 
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Fig. 1 Bivariate normal with ju; = p22 = 0, 0) = o2 = 1, and p = —0.9, —0.5,0.5,0.9. 
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(c) 


(d) 


Fig. 1 (continued). 


We first show that (1) indeed defines a joint PDF. In fact, we prove the following result. 


Theorem 1. The function defined by (1) and (2) with a; > 0, 72 > 0, |p| < 1 is a joint 
PDF. The marginal PDFs of X and ¥ are, respectively, N(j11,07) and N(ji2,03), and p is 
the correlation coefficient between X and Y. 
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Proof. Let f\(x) = / f(x,y) dy. Note that 


_ {»=bne/oneay | . .) Eas 


It follows that 


1 =a f° exp {—(y— f;)?/[203(1 — p?)]} 
_ d 3 
fi (x) ie ex 2a? . ern ly, (3) 
where we have written 
3.=1+0(2) (x— 1). (4) 
O71 


The integrand is the PDF of an N(,,03(1 — p”)) RV, so that 


fii) =— oS) ae 
x)= ex —00 <x < 00. 
; o,V20 “| 2 om 


[. { [fo ay} dx = [4 (x) de =1, 


and f(x,y) is a joint PDF of two RVs of the continuous type. It also follows that f; is the 
marginal PDF of X, so that X is N(j1,07). In a similar manner we can show that Y is 
N(p2, 03). 

Furthermore, we have 


Thus 


(5) 


f(x,y) = 1 exp —(y— Bx)? \ 
Ai) 02,1 —p? V20 203(1—p*) J’ 


where (3, is given by (4). It is clear, then, that the conditional PDF fy\x(y | x) given by (5) 
is also normal, with parameters 3, and o3(1— pp”). We have 


EAY |x} = Be= pn tp (xm) (6) 
and 


var{¥|x} = 03(1—p”). (7) 
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In order to show that p is the correlation coefficient between X and Y, it suffices to 
show that cov(X, Y) = poi02. We have from (6) 


E(XY) = E{E{XY|X}} 


It follows that 


cov(X, Y) = E(XY) — py 2 = poyo2. 


Remark 1. If p? = 1, then (1) becomes meaningless. But in that case we know 
(Theorem 4.5.1) that there exist constants a and b such that P{Y = aX +b} = 1. We 
thus have a univariate distribution, which is called the bivariate degenerate (or singular) 
normal distribution. The bivariate degenerate normal distribution does not have a PDF 
but corresponds to an RV (X,Y) whose marginal distributions are normal or degenerate 
and are such that (X,Y) falls on a fixed line with probability 1. It is for this reason that 
degenerate distributions are considered as normal distributions with variance 0. 


Next we compute the MGF M(t), f2) of a bivariate normal RV (X, Y). We have, if f (x,y) 
is the PDF given in (1) and f; is the marginal PDF of X, 


M(t) = 7 / f(x,y) drdy, 


= fff treo nerray empiayas 

= = tx 1 29 2 02 

= e''“f, (x) 4 exp 70282(1— p )+h peep ei) dx 
1 a me gs 

ee Fez — p) + top — pra / ereelorel af (x) de. 


—Co 


Now 


fore) 2 
. 1 
/ eltitpher/or)xg (x) dx = exp | (« +pZn) ab aa (« +n) | . 
= O01 2 O71 


Therefore, 


ott, +056, + 2poioatihr ) (8) 


M(t,t2) = exp (inn +10 | ji 


The following result is an immediate consequence of (8). 
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Theorem 2. If (X,Y) has a bivariate normal distribution, X and Y are independent if and 
only if p= 0. 


Remark 2. It is quite possible for an RV (X,Y) to have a bivariate density such that the 
marginal densities of X and Y are normal and the correlation coefficient is 0, yet X and Y 
are not independent. Indeed, if the marginal densities of X and Y are normal, it does not 
follow that the joint density of (X, Y) is a bivariate normal. Let 


1 1 wall 
fay) =5 ‘=u py sa ae) (x? — 2pxy4 | (9) 


| 


| EN oa : 200+7)] 


Here f(x,y) is a joint PDF such that both marginal densities are normal, f(x,y) is not 
bivariate normal, and X and Y have zero correlation. But X and Y are not independent. We 
have 


1 2 
fil(x) = Ja ae —00 <x< 00, 
T 
1 2 
Aly) = Na ve —ooO<y<o, 
T 
EXY =0. 


Example 1. (Rosenberg [93]). Let f and g be PDFs with corresponding DFs F and G. 
Also, let 


h(x,y) =f(x)gQ)[1 + a(2F(x) — 1)(2G(Qy) — 1], (10) 


where |a| < 1 is a constant. It was shown in Example 4.3.1 that / is a bivariate density 
function with given marginal densities f and g. 
In particular, take f and g to be the PDF of (0, 1), that is, 


f(x) = a(x) = ae 00 <x<00, (11) 


and let (X,Y) have the joint PDF h(x,y). We will show that X + Y is not normal except in 
the trivial case a = 0, when X and Y are independent. 
Let Z= X-+Y. Then 
EZ =0, var(Z) = var(X) + var(Y) + 2cov(X,Y). 


It is easy to show (Problem 2) that cov(X, Y) = a/n, so that var(Z) = 2[1+ (a/7)]. If Z 
is normal, its MGF must be 


M,(t) = ef U+(e/™)), (12) 
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Next we compute the MGF of Z directly from the joint PDF (10). We have 
M, (t) _— ne), 


_fro Lr if e+ 12 F(x) — 1][2F(y) — If (xf (y) dedy 


=e +a | / : e" [2F (x) — If (x) as . 


Now 


Je "DF(x) —I]f(x)de = 2 fe (OU xar+e"/? 


—~ ie von 26 
nex { = slt+ (v4)? —2e 


eft [ne 


8 


She 8 


2 hegt pe] dudx 


: 2 2 

wf [ODDO FL) Felt VAIO 
a oof Horoial} 

eda Jz dy 

= fre Plz > =}. (13) 


where Z, is an N(0, 1) RV. 
It follows that 


2 
2 2 1 
M(t) =e talef—26'P{z, > =} 


140 (1-2°{a > 4h). (14) 


If Z were normally distributed, we must have M,(t) = M(t) for all ¢ and all Ja| < 1, 


that is, 
t 2; 


For a = 0, the equality clearly holds. The expression within the brackets on the right side 
2 

of (15) is bounded by 1 +a, whereas the expression e(¢/7)" is unbounded, so the equality 

cannot hold for all ¢ and a. 


2 
=e 


2 2 2 
e! elo/m)t = ¢ 


Next we investigate the multivariate normal distribution of dimension n, n > 2. Let M 
be ann Xn real, symmetric, and positive definite matrix. Let x denote the n x | column 


BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 235 


vector of real numbers (x, ,x2,...,X,)/ and let yz denote the column vector (1), (2, ---; Ln)’, 
where ju;(i = 1,2,...,) are real constants. 


Theorem 3. The nonnegative function 


(x— WM H) \ 


CO <x; <0C0, (16) 


Flos) = cexp{ 
PSA 2... 


defines the joint PDF of some random vector X = (X),X2,...,X,)’, provided that the 
constant c is chosen appropriately. The MGF of X exists and is given by 


t’M~'t 
M(t stasest) =e {4 9 \ (17) 
where t = (f,f,...,f»)/ and t,f2,...,t, are arbitrary real numbers. 


Proof: Let 


tae ff ex {tx oer) NTT as, (18) 


Changing the variables of integration to yj, y2,...,¥, by writing xj — 4; = yj, i= 1,2,...,n, 
and y = (y1,Y2,---;n)’, we have x — x = y and 


foe) love) ! n 
_ / »  y My 
I=cexp(t wf few (« \ cae) ) I dy;. (19) 


Since M is positive definite, it follows that all the n characteristic roots of M, say 
m,M2,...,Mp, are positive. Moreover, since M is symmetric there exists an n x n orthog- 
onal matrix L such that L’ML is a diagonal matrix with diagonal elements m,,mz,...,17n- 
Let us change the variables to z,,2,...,Z, by writing y = Lz, where 2! = (z,22,..-,Zn), 
and note that the Jacobian of this orthogonal transformation is |L|. Since L'L = I, where 
I, is ann Xn unit matrix, |L| = 1 and we have 


- = 'L'MLz)\ 7 
t=cesp(t'n) ff exp ('t2- ae Lh a (20) 


If we write tL = u! = (uj,u,...,Un) then t’Lz = Soy_, uz. Also L/ML = 
diag(m,,m2,...,7Mn) so that 2/L/MLz = >~_, m;z?. The integral in (20) can therefore 
be written as 
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If follows that 
o) n/2 MG 2 
l= coxp(t'a) exp (>: i) (21) 


Setting t; =f) =--- =1t, = 0, we see from (18) and (21) that 


oo oo 2 n/2 
/ | (i tin. Die — 


—co 


By choosing 
(22) 


we see that f is a joint PDF of some random vector X, as asserted. 
Finally, since 


(L’ML)~! = diag(m;',my',...,m;,'), 


we have 
n uw 
St =u'(L'M'L)u=t'M"'t. 
mM; 
i=! 


Also 
|M~!| = |L'M~!L| = (mymz--+m,)7!. 
It follows from (21) and (22) that the MGF of X is given by (17), and we may write 


ee ne 
{(2m)"|an—!]}1/2° 


C= 


(23) 


This completes the proof of Theorem 3. 


Let us write M~! = (0%); j=1,2,....n. Then 


re 
M(0,0,...,0,1;,0,...,0) = exp (ston) 


is the MGF of Xj, i = 1,2,...,n. Thus each X; is N(j4;,0;;), i= 1,2,...,n. For i #j, we 
have for the MGF of X; and x; 
M(0,0,...,0,1;,0,...,0,f,0,...,0) 


2 2 
oyt; + 2o;tit; +o; 
= exp (sr p = s i). 
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This is the MGF of a bivariate normal distribution with means 4;, j4;, variances oj, Oj, 
and covariance o;;. Thus we see that 


= (11, H2,---5 Hn) (24) 
is the mean vector of X’ = (X),...,Xn), 
oi = 0? = var(X;), i= 1,2,...,n, (25) 
and 
Oij = Pij FiO}, LAY; if = Vy Qyee 5 (26) 
The matrix M~! is called the dispersion (variance-covariance) matrix of the multivariate 
normal distribution. 


If o = 0 for i 4 j, the matrix M7! is a diagonal matrix, and it follows that the RVs 
X\,X2,...,X, are independent. Thus we have the following analog of Theorem 2. 


Theorem 4. The components X),X2,...,X,, of a jointly normally distributed RV X are 
independent if and only if the covariances oj, = 0 for all i Aj (i,j = 1,2,...,n). 


The following result is stated without proof. The proof is similar to the two-variate case 
except that now we consider the quadratic form in n variables: E{~"_, t;(X;—;)}° > 0. 


Theorem 5. The probability that the RVs X1,X2,...,X, with finite variances satisfy at 
least one linear relationship is 1 if and only if |M]| = 0. 

Accordingly, if |M]| = 0 all the probability mass is concentrated on a hyperplane of 
dimension < n. 


Theorem 6. Let (X),X2,...,X,) be an n-dimensional RV with a normal distribution. Let 
Y,Y2,..., Ye, k <n, be linear functions of X; (j = 1,2,...,2). Then (Yi, Y2,...,Yx) also 


has a multivariate normal distribution. 


Proof. Without loss of generality let us assume that EX; = 0,i = 1,2,...,n. Let 
Y=) AX, pal 2yacky ka. (27) 
j=l 
Then EY, = 0, p = 1,2,...,k, and 
cov(Yp, Yq) = So ApAgioa, (28) 


ij=l 


where E(X;X;) = oj, i,j =1,2,...,n. 
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The MGEF of (¥;, Y2,..., Y;) is given by 
n n 
M" (ti, t2,.-.,t%) =E ¢ exp ty SO AyXj +--+ S$) Ay; 
j=l j=l 


aA k ‘ 
Writing uj = et tpApj. j = 1,2,...,n, we have 


n 
M*(t1,to,-..,th) = {ex (sox ! 
i=] 


1 1 
= exp 5 ‘Ss O juju; by (17) 


ij=l 


1 n k 
= exp 5 Ss Oj7 a titmAAm; 


ij=l l,m=1 


k n 
1 
= exp 2 x tit 2 AyAnjj 


I,m=1 ij=1 
k 
1 
=exp {5 S> titm cov(Yi, Ym) . (29) 
l,m=1 


When (17) and (29) are compared, the result follows. 


Corollary 1. Every marginal distribution of an n-dimensional normal distribution is 
univariate normal. Moreover, any linear function of X1,Xo,...,X,, is univariate normal. 


Corollary 2. If X,,X2,...,Xn are iid N(1,07) and A is an n x n orthogonal transforma- 
tion matrix, the components Y, Y2,...,Y, of Y = AX’, where X = (Xj,...,X,)’, are 
independent RVs, each normally distributed with the same variance o~. 


We have from (27) and (28) 


cov( (Y, slg) )= Lo Andytict Ay Agi 
iAj 


_ JO ifp#g, 
— lo ifp=a, 


since });_, ApiAgi = 0 and ))"_, A>, = 1. It follows that 


M* (ty, t,..., =ee(35 Yo ‘ 


and Corollary 2 follows. 
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Theorem 7. Let X = (X,,X2,...,X,,)/. Then X has an n-dimensional normal distribution 
if and only if every linear function of X 


X't= 1X1 + 12X2 +++ +)Xn 
has a univariate normal distribution. 


Proof. Suppose that X’t is normal for any t. Then the MGF of X’t is given by 
I 40 
M(s) = exp { bs+ 37 sop. (30) 


Here b = E{X't} = “i tyy; = t's, where po! = (u1,..., Hn), and o? = var(X’t) = 
var(>>1:X;) = t’M~'t, where M~! is the dispersion matrix of X. Thus 


1 
M(s) = exp (‘n+ 3UM-'ts) ‘ (31) 
Let s = | then 
1 
M(1) = exp (n+ 3M") ‘ (32) 


and since the MGF is unique, it follows that X has a multivariate normal distribution. The 
converse follows from Corollary | to Theorem 6. 


Many characterization results for the multivariate normal distribution are now available. 
We refer the reader to Lukacs and Laha [70, p. 79]. 
PROBLEMS 5.4 
1. Let (X,Y) have joint PDF 
1 ae 31 a. 71 
Flay) = a ae{ 7 (= ao BG" 3) 6) f° 


for —co <x <0, —-CO<y<o. 


(a) Find the means and variances of X and Y. Also find p. 
(b) Find the conditional PDF of Y given X = x and E{Y|x}, var{Y|x}. 
(c) Find P{4< ¥ < 6|X =4}. 

2. In Example 1 show that cov(X, Y) = a/7. 

3. Let (X,Y) be a bivariate normal RV with parameters 11, 12, 07, 03, and p. What is 
the distribution of X + Y? Compare your result with that of Example 1. 

4. Let (X,Y) be a bivariate normal RV with parameters j41, /12, Ot, Gs, and p, and let 
U =aX+b,a#0, and V=cY+d,c £40. Find the joint distribution of (U,V). 
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10. 


11. 
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. Let (X, Y) bea bivariate normal RV with parameters ju; = 5, 2 = 8, 07 = 16, 05 = 9, 


and p= 0.6. Find P{5 < Y <11|X =2}. 


. Let X and Y be jointly normal with means 0. Also, let 


W = Xcosd+ Ysind, Z = Xcos0— Ysiné. 


Find @ such that W and Z are independent. 


. Let (X,Y) be a normal RV with parameters 11, /12, 07, 75, and p. Find a necessary 


and sufficient condition for X + Y and X — Y to be independent. 


. For a bivariate normal RV with parameters [11, [42,01,02, and p show that 


1 1 =I Pp 
P(X > in, ¥> ja) = 7 +5 tan i 
=p 


[Hint: The required probability is P((X — u)/o1 > 0, (Y — pu2)/o2 > 0). Change 
to polar coordinates and integrate. | 


. Show that every variance—covariance matrix is symmetric positive semidefinite and 


conversely. If the variance—covariance matrix is not positive definite, then with prob- 
ability 1 the random (column) vector X lies in some hyperplane c’X = a with 
c#0. 

Let (X,Y) be a bivariate normal RV with EX = EY = 0, var(X) = var(Y) = 1, and 
cov(X, Y) = p. Show that the RV Z = Y/X has a Cauchy distribution. 


(a) Show that 
1 bees 
f(x) = Oayh exp 7 \ 


is ajoint PDF on &,,. 


(b) Let (X1,X2,...,X,) have PDF f given in (a). Show that the RVs in any proper 
subset of {X),X2,...,X,} containing two or more elements are independent 
standard normal RVs. 


Teen! 
1+ II (sie ') 


5.5 EXPONENTIAL FAMILY OF DISTRIBUTIONS 


Most of the distributions that we have so far encountered belong to a general family of 
distributions that we now study. Let © be an interval on the real line, and let {f : 0 € O} 
be a family of PDFs (PMFs). Here and in what follows we write x = (x1,x2,...,%,) unless 
otherwise specified. 


Definition 1. If there exist real-valued functions Q(@) and D(@) on © and Borel- 
measurable functions T(x),%2,...,X,) and S(x1,x2,...,%,) on R, such that 


fo(%1,%2,-++,%n) = exp{Q(9)T(x) + D(A) + S(x)}, (1) 


we say that the family {fg,@ € O} is a one-parameter exponential family. 
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Let X|,X,...,X, be tid with PMF (PDF) fg. Then the joint distribution of X = 
(X41, Xo,..., Xm) is given by 


m 


= T]he) -[Tootow T(x;) +D(O) + S(xi)} 


nef c¢n5 rpm Sos}, 


where x = (X1,X2,...,Xm), Xj = (%j1,4j2,---,Xjn), j = 1,2,...,m, and it follows that 
{go : 6 € O} is again a one-parameter exponential family. 


Example 1. Let X ~ N({19,07), where jug is known and o? unknown. Then 


= (x — po)? 
for() = = exp 4-54 \ 


=exp {—loe(ov 2m) - ae] 


is a one-parameter exponential family with 


Q(c*) = — T(x) = (x— po)’, S(x) =0, and 


D(o?) = —log(oV2z). 


If X ~ N(u, 04), where oo is known but jz is unknown, then 


il x 
— ex 
ooV 20 P( 206 95 204 


is a One-parameter exponential family with 


LL lu 
Ou) =>, D(u) =F) T(x) =x, 
% % 
and 
a 2 
S(x) =— 2a += 5 log(2n08) : 


Example 2. Let X ~ P(X), \ > 0 unknown. Then 


Mn 
Py){X =x} = ier = exp{—A+xlogA—log(x!)}, 
x! 
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and we see that the family of Poisson PMFs with parameter \ is a one-parameter 
exponential family. 

Some other important examples of one-parameter exponential families are binomial, 
G(a,) (provided that one of a, ( is fixed), B(a,8) (provided that one of a, 6 is 
fixed), negative binomial, and geometric. The Cauchy family of densities and the uniform 
distribution on [0,6] do not belong to this class. 


Theorem 1. Let {f¢: 9 € QO} be a one-parameter exponential family of PDFs (PMFs) 


given in (1). Then the family of distributions of T(X) is also a one-parameter exponential 
family of PDFs (PMFs), given by 


8o(t) = exp{tQ() + D(G) + S* (1)} 
for suitable S*(f). 
Proof. The proof of Theorem | is a simple application of the transformation of variables 
technique studied in Section 4.4 and is left as an exercise, at least for the cases considered 


in Section 4.4. For the general case we refer to Lehmann [64, p. 58]. 


Let us now consider the k-parameter exponential family, k > 2. Let O C R, be a k- 
dimensional interval. 


Definition 2. If there exist real-valued functions Q,,Q2,...,Q,,D defined on ©, and 
Borel-measurable functions T,,7>,...,7,,S on ®,, such that 


: 
fa(x) = es0f $0,030 0) +B) +500} (2) 


we say that the family {fg, 9 € O} is a k-parameter exponential family. 
Once again, if X = (X),Xo,...,X,,) and X; are iid with common distribution (2), the 
joint distributions of X form a k-parameter exponential family. An analog of Theorem | 


also holds for the k-parameter exponential family. 


Example 3. The most important example of a k-parameter exponential family is (1,07) 
when both jz and o? are unknown. We have 


0=(u,07), O={(u,07):—co<p<co,0’ > 0} 


and 


x? — Qux+ pe? 
20? 


fo(x) = exp ( 
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It follows that fg is a two-parameter exponential family with 


: 02(0) = 5. Ti(x)=x,  To(x) =x, 


D(@) =—-= E +og(2no*) , and S(x)=0. 


Other examples are the G(a, 3) and B(q, 3) distributions when both a, 3 are unknown, 
and the multinomial distribution. U[a, 6] does not belong to this family, nor does C(a, 3). 

Some general properties of exponential families will be studied in Chapter 8, and the 
importance of these families will then become evident. 


Remark I. The form in (2) is not unique as easily seen by substituting aQ; for Q; and 
(1/a)T; for T;. This, however, is not going to be a problem in statistical considerations. 


Remark 2. The integer k in Definition 2 is also not unique since the family {1,Q1,...,Q,} 
or {1,7),..., 7%} may be linearly dependent. In general, k need not be the dimension of 0. 


Remark 3. The support {x : fg(x) > 0} does not depend on 6. 


Remark 4. In (2), one can change parameters to 7; = Q;(@), i= 1,2,...,k so that 


k 
In (x) = exn{ Sonia) + D(a) +5¢8| (3) 


i=1 


where the parameters 7 = (1, 172,---, 1) are called natural parameters. Again n; may be 
linearly dependent so one of 7; may be eliminated. 


PROBLEMS 5.5 


1. Show that the following families of distributions are one-parameter exponential 
families: 
(a) X ~ b(n,p). 
(b) X ~ G(a, 8), () if a is known and (ii) if 6 is known. 
(c) X ~ B(a, 8), (i) if a is known and (ii) if 8 is known. 
(d) X ~ NB(r;p), where r is known, p unknown. 

2. Let X ~ C(1,0). Show that the family of distributions of X is not a one-parameter 
exponential family. 


3. Let X ~ U[0,6], 6 € [0,00). Show that the family of distributions of X is not an 
exponential family. 


4. Is the family of PDFs 
A ye 
fo(x) = os —00 <x < 00,0 € (—00, 00), 


an exponential family? 
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5. Show that the following families of distributions are two-parameter exponential 
families: 


(a) X ~ G(a, 8), both a and 6 unknown. 
(b) X ~ B(a, 8), both a and 6 unknown. 

6. Show that the families of distributions Ula, 3] and C(a,() do not belong to the 
exponential families. 

7. Show that the multinomial distributions form an exponential family. 


SAMPLE STATISTICS AND THEIR 
DISTRIBUTIONS 


6.1 INTRODUCTION 


In the preceding chapters we discussed fundamental ideas and techniques of probability 
theory. In this development we created a mathematical model of a random experiment by 
associating with it a sample space in which random events correspond to sets of a certain 
o-field. The notion of probability defined on this o-field corresponds to the notion of 
uncertainty in the outcome on any performance of the random experiment. 

In this chapter we begin the study of some problems of mathematical statistics. The 
methods of probability theory learned in preceding chapters will be used extensively in 
this study. 

Suppose that we seek information about some numerical characteristics of a collection 
of elements called a population. For reasons of time or cost we may not wish or be able to 
study each individual element of the population. Our object is to draw conclusions about 
the unknown population characteristics on the basis of information on some characteristics 
of a suitably selected sample. Formally, let X be a random variable which describes the 
population under investigation, and let F be the DF of X. There are two possibilities. Either 
X has a DF Fo with a known functional form (except perhaps for the parameter 0, which 
may be a vector) or X has a DF F about which we know nothing (except perhaps that F 
is, say, absolutely continuous). In the former case let O be the set of possible values of the 
unknown parameter 0. Then the job of a statistician is to decide, on the basis of a suitably 
selected sample, which member or members of the family {F9,@ € ©} can represent the 
DF of X. Problems of this type are called problems of parametric statistical inference and 
will be the subject of investigation in Chapters 8 through 12. The case in which nothing is 
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known about the functional form of the DF F of X is clearly much more difficult. Inference 
problems of this type fall into the domain of nonparametric statistics and will be discussed 
in Chapter 13. 

To be sure, the scope of statistical methods is much wider than the statistical inference 
problems discussed in this book. Statisticians, for example, deal with problems of plan- 
ning and designing experiments, of collecting information, and of deciding how best the 
collected information should be used. However, here we concern ourselves only with the 
best methods of making inferences about probability distributions. 

In Section 6.2 of this chapter we introduce the notions of (simple) random sample and 
sample statistics. In Section 6.3 we study sample moments and their exact distributions. In 
Section 6.4 we consider some important distributions that arise in sampling from a normal 
population. Sections 6.5 and 6.6 are devoted to the study of sampling from univariate and 
bivariate normal distributions. 


6.2 RANDOM SAMPLING 


Consider a statistical experiment that culminates in outcomes x, which are the values 
assumed by an RV X. Let F be the DF of X. In practice, F will not be completely known, 
that is, one or more parameters associated with F will be unknown. The job of a statistician 
is to estimate these unknown parameters or to test the validity of certain statements about 
them. She can obtain n independent observations on X. This means that she observes n 
values x1,X2,...,X, assumed by the RV X. Each x; can be regarded as the value assumed 
by anRV X;,i=1,2,...,n, where X1,X2,...,X, are independent RVs with common DF F. 
The observed values (x,x2,...,X,) are then values assumed by (X,,X2,...,X;,). The set 
{X,,X2,...,X,} is then a sample of size n taken from a population distribution F. The set 


of n values x1,%2,...,X, is called a realization of the sample. Note that the possible values 
of the RV (X1,X2,...,X,) can be regarded as points in R,,, which may be called the sample 
space. In practice one observes not x1,x2,...,X, but some function f(x) ,22,...,X%,). Then 


f (x1,%2,..-;Xp) are values assumed by the RV f(X1,X2,...,Xn). 
Let us now formalize these concepts. 


Definition 1. Let X be an RV with DF F, and let X,,X>,...,X,, be iid RVs with common 
DF F. Then the collection X;,X2,...,X, is known as a random sample of size n from the 


DF F or simply as 1 independent observations on X. 


If X,,Xo,...,X, 1s arandom sample from F, their joint DF is given by 
F* (x1,%2,--+,Xn) =|[ Ft). (1) 
i=1 


Definition 2. Let X,,X>,...,X, be nm independent observations on an RV X, and let 
f: Ry — Ry be a Borel-measurable function. Then the RV f(X1,X2,...,X,) is called a 
(sample) statistic provided that it is not a function of any unknown parameter(s). 


Two of the most commonly used statistics are defined as follows. 
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Definition 3. Let X,,X>,...,X, be arandom sample from a distribution function F’. Then 
the statistic 


_ n xX 
— a 2 
n » 7 (2) 


is called the sample mean, and the statistic 
n > n 2 =2 
X;—-X)? So, X? —nX 
2 = > ( U = i=1**1 


n—-1l n—1l 


(3) 


1 


is called the sample variance and S is called the sample standard deviation. 


Remark I. Whenever the word “sample” is used subsequently, it will mean “random 
sample.” 


Remark 2. Sampling from a probability distribution (Definition 1) is sometimes referred 
to as sampling from an infinite population since one can obtain samples of any size one 
desires even if the population is finite (by sampling with replacement). 


Remark 3. In sampling without replacement from a finite population, the independence 
condition of Definition | is not satisfied. Suppose a sample of size 2 is taken from a finite 
population (a),a2,...,ay) without replacement. Let X; be the outcome on the ith draw. 
Then P{X; = a\} = 1/N, P{X2 = ap | X; = ay} = yy, and P{X2 = a | X; =a} =0. 
Thus the PMF of X> depends on the outcome of the first draw (that is, on the value of X,), 


and X, and X> are not independent. Note, however, that 


N 


P{X2 =a} = > P{X, =aj}P{X. = ay | a} 


1 
=) 5 P{X, =a} P{X. = a2 | aj} = ve 
iF 


and X; = X>. A similar argument can be used to show that X),X2,...,X, all have the same 
distribution but they are not independent. In fact, X),Xo,...,X, are exchangeable RVs. 
Sampling without replacement from a finite population is often referred to as simple 
random sampling. 


Remark 4. It should be remembered that sample statistics X, S* (and others that we will 
define later on) are random variables, while the population parameters ju, 07, and so on 
are fixed constants that may be unknown. 


Remark 5. In (3) we divide by n — | rather than n. The reason for this will become clear 
in the next section. 
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Remark 6. Other frequently occurring examples of statistics are sample order statistics 

X(1),X(2);--+,X(n) and their functions, as well as sample moments, which will be studied 

in the next section. 

Example 1. Let X ~ b(1,p), where p is possibly unknown. The DF of X is given by 
F(x) = pe(x—1)+(1 —p)e(x), xER. 


Suppose that five independent observations on X are 0, 1, 1, 1, 0. Then 0, 1, 1, 1, Oisa 


realization of the sample X),X2,...,X5. The sample mean is 
0+1+1+1+4+0 
x= a =O 


which is the value assumed by the RV X. The sample variance is 


2 wa (ui—¥) _ 2(0.6)? +3(0.4)? _ 
r= 5. a4 a = 0.3, 


i=1 


which is the value assumed by the RV S$. Also s = V0.3 = 0.55. 


Example 2. Let X ~ N(1,07), where ju is known but o? is unknown. Let X),X,...,Xn be 
a sample from N(j1,07). Then, according to our definition, )~"_ , X;/o7 is not a statistic. 

Suppose that five observations on X are —0.864, 0.561, 2.355, 0.582, —0.774. Then the 
sample mean is 0.372, and the sample variance is 1.648. 


PROBLEMS 6.2 


1. Let X be a D(1, 5) RV, and consider all possible random samples of size 3 on X. 
Compute X and S? for each of the eight samples, and also compute the PMFs of X 
and S?. 

2. A fair die is rolled. Let X be the face value that turns up, and X;, X2 be two 
independent observations on X. Compute the PMF of X. 

3. Let X1,X2,...,X;, be a sample from some population. Show that 


max |X;—X| < es 
1<i<n Jn 
unless either all the n observations are equal or exactly n— | of the X;’s are equal. 
(Samuelson [99]) 
4, Let x1,x2,...,%, be real numbers, and let x(,) = max{x,,x2,...,%n}, X01) = 
min{x1,x2,...,X,}. Show that for any set of real numbers a1,d2,...,d, such that 
yo, ai = 0 the following inequality holds: 


n 


s QjixXi| S 


i=1 
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5. For any set of real numbers x;,x2,...,x, show that the fraction of x,,x2,...,Xp 
included in the interval (x —ks,x+ks) for k > 1 is at least 1 — 1/k?. Here x is the 
mean and s the standard deviation of x’s. 


6.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 


Let X,,X2,...,X, be a sample from a population DF F. In this section we consider some 
commonly used sample characteristics and their distributions. 


Definition 1. Let F(x) =n7' S07", (x — Xj). Then nF; (x) is the number of X;’s (1 < 
k <n) that are < x. F*(x) is called the sample (or empirical) distribution function. 


We note that 0 < F7(x) < 1 for all x, and, moreover, that F* is right continuous, 
nondecreasing, and F*(—oo) = 0, F; (co) = 1. Thus F* is a DF. 

If X(1),X(2),---,X(n) is the order statistic for X|,X2,...,Xn, then clearly 
if x < X 1) 
if Xu) SX < XK41) (k= 1,2,...,n—1). (1) 
if x 2 Xn): 


Fi(x) = 


Fa |arO 


For fixed but otherwise arbitrary x € , F*(x) itself is an RV of the discrete type. The 
following result is immediate. 


Theorem 1. The RV F* (x) has the probability function 
piri) =2h=(*)iFwHN-FOW,  F=0tenm 
n 


with mean 
EF, (x) = F(x) (3) 
and variance 
F(x)[1— F()] 


var(F; (x)) = a a (4) 


Proof. Since e(x—X;),j=1,2,...,n, are iid RVs, each with PMF 
Pfe(x-X)) = 1} = P{x—-X}2 0} =F) 
and 
P{e(x—X)) <0} =1- F(x), 


their sum nF*(x) is a b(n,p) RV, where p = F(x). Relations (2), (3), and (4) follow 
immediately. 
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We next consider some typical values of the DF F* (x), called sample statistics. Since 


F* (x) has jump points X;, 7 = 1,2,...,n, itis clear that all moments of F* (x) exist. Let us 
write . 
n 
aqr=n! pe (5) 
j=l 


for the moment of order k about 0. Here a, will be called the sample moment of order k. 
In this notation 


a, =n'S°xX) =X. (6) 
j=l 


The sample central moment is defined by 


n n 


be =n! S°(Xj—ay)h =n! S (xX) - XJ. (7) 


Clearly, 


As mentioned earlier, we do not call bz the sample variance. S? will be referred to as the 
sample variance for reasons that will subsequently become clear. We have 


by = ay — a. (8) 


For the MGF of DF F* (x), we have 
M*(t)=n!S el, (9) 


Similar definitions are made for sample moments of bivariate and multivariate dis- 
tributions. For example, if (X),Y1),(X2,¥2),...,(Xn,Y,) is a sample from a bivariate 
distribution, we write 


X=n'S°X, and van yy (10) 


n n 


boy =n! S°(X%j)-XY, bb =n" S7(¥;-Y)’, (11) 


j=l j=l 


by =n" > (%)-X)(Yj-Y). 
j=l 
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Once again we write 


n n 


Sp =(n—-1)'S0(X%j-XY and S}=(n-1)"'S0(¥;-Y) (12) 


j=l 1 
for the two sample variances, and for the sample covariance we use the quantity 


Sy =(n—1)7! SX -X) (Yj -¥). (13) 


j=l 
In particular, the sample correlation coefficient is defined by 


_ bi, _ Su 
Vbabor — S1S2 


It can be shown (Problem 4) that |R| < 1, the extreme values +1 can occur only when all 
sample points (X1,Y1),.--,(Xn, Yn) lie on a straight line. 

The sample quantiles are defined in a similar manner. Thus, if 0 < p < 1, the sample 
quantile of order p, denoted by Zp, is the order statistic X(,), where 


(14) 


i if np is an integer, 
r= 


[np +1] if np is not an integer. 


As usual, [x] is the largest integer < x. Note that, if np is an integer, we can take any value 
between X(,,,) and X(np)41 as the pth sample quantile. Thus, if p = 5 and n is even, we 
can take any value between X(,/2) and X(,/2)41, the two middle values, as the median. It 


is customary to take the average. Thus the sample median is defined as 


Xess if n is odd, 
Zi. = X(n/2) +X((n/2) 41) if nis even = 
5 ’ 
Note that 
Se-( 
2 a 2 
if n is odd. 


Example 1. A random sample of 25 observations is taken from the interval (0,1): 


0.50 0.24 0.89 0.54 0.34 0.89 0.92 0.17 0.32 0.80 
0.06 0.21 0.58 0.07 0.56 0.20 0.31 0.17 0.41 0.38 
0.88 0.61 0.35 0.06 0.90 
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In order to compute F’5,, the first step is to order the observations from smallest to largest. 
The ordered sample is 


0.06, 0.06, 0.07, 0.17, 0.17, 0.20, 0.21, 0.24, 0.31, 0.32, 0.34, 
0.35, 0.38, 0.41, 0.50, 0.54, 0.56, 0.58, 0.61, 0.80, 0.88, 0.89, 
0.89, 0.90, 0.92 


Then the empirical DF is given by 


0, x < 0.06 

2/25, 0.06<x<0.07 

3/25, 0.07<x<0.17 

Fig(x)= 4 5/25, O.17<x<0.20 


24/25, 0.90<x<0.92 
1, x>0.92 


A plot of F5; is shown in Fig. 1. The sample mean and variance are 
X¥=0.45, s* =0.084, and s =0.29. 


Also sample median is the 13th observation in the ordered sample, namely, z1/2 = 0.38, 
and if p = 0.2 then np = 5 and zo.2 = 0.17. 


0.8 _— 
0.6 = 
0.4 = 


0.2 — 


0 0.2 0.4 0.6 0.8 1 


Fig. 1 Empirical DF for data of Example 1. 
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Next we consider the moments of sample characteristics. In the following we write 


EX* = my and E(X — 1)‘ = ju, for the kth-order population moments. Wherever we use 
mg (or jug), it will be assumed to exist. Also, a” represents the population variance. 


Theorem 2. Let X|,X>,...,X,, be a sample from a population with DF F. Then 


EX = p, 6) 
= o 
var(X) = — a 
cc ae 
and 
ee a 
(n—1)(n—2)(n—3)p* 


; i} 


Proof. In view of Theorems 4.5.3 and 4.5.7, it suffices to prove (18) and (19). We have 


3 


2X, a a S > XXX, 
j=l 


iék jAkAL 


and (18) follows. Similarly, 


(x) = (ox) (; Dix A K+ D Rte 


i=1 J#k JPKAL 
= = xa +350 X7XE+6 N° X?XX, 
J#k JAk iAj#k 
ya 
iAiPkAl 


and (19) follows. 
Theorem 3. For the third and fourth central moments of X, we have 

yi3(X) = (20) 
and 


2 
pa(X) = i (21) 
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Proof. We have 


and 


i=1 
io 42cf4\1 
= DEK — Hu) > AX b)" (Xj — wy} 
i=1 i<j 
4 3(n—1) 4 
= ge Pe 
Theorem 4. For the moments of b2, we have 
—1)o? 
He) = (22) 
n 
ye 2) ay) 2 3 2 
“=e (Ha : Ha) , Ha : ay (23) 
n n n 
n—1)(n—2 
Hibs) = SO i, (24) 
n 
and 
n—1)(n? —3n+3 3(n—1)(2n—3 
Ry) =o ne Oe (25) 


Proof. We have 


Now 
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Writing Y; = X;— 1, we see that EY; = 0, var(Y;) = 07, and EY} = jus. We have 
n 2 
ren=6(Soa—F 
l 
n 
2 
= 4 22 2x72, 4 
-el Sey? (Sayan 


i=1 iAj iAj j=l 
1 oe oR ed 
eee 
iFj 1 
It follows that 
2 
n Eb; = njig+n(n—1)o* [n(n — 1)o* + nya] 
n 


1 
ee —1)o4 
+ 2 [3n(n— 1)o" +np4] 
1 3 
= (n-2+ ~) la + (n-24 *) (n—1)p3 (tu =07). 


Therefore, 


var(b)) = Eb5 — (Ebz)" 


1 2 xl 
=(n 2+ B+ 1)(n 243) a (* )a 
n n n ne n 


=(r 247) Bel 3 a 


n) ne 


as asserted. 
Relations (24) and (25) can be proved similarly. 


Corollary 1. ES? = o?. 


This is precisely the reason why we call S?, and not b2, the sample variance. 


Corollary 2. var(S”) = = + na je. 


Remark I. The results of Theorems 3 to 5 can easily be modified and stated for the case 
when the X;’s are exchangeable RVs. Thus (16) holds and (17) has to be modified to 


2 
= -1 
var(X) = — ee po, (17) 


where p is the correlation coefficient between X; and X;. The expressions for (©X;)* 
and (=x,)* in the proof of Theorem 3 still hold but both (18) and (19) need appropriate 
modification. For example, (18) changes to 
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3 _ mt 3(n— 1)E(X}Xx) + (n— 1)(n— 2) E(X;X,X) 


EX 5 (18’) 
n 
Let us show how Corollary 1 changes for exchangeable RVs. Clearly, 
(n= 1)8? = 0% — p)? — (Rp)? 
i=l 
so that 
(n—1)ES? = no* — nE(X — 

=no’ — {07 +(n—1)po"}. 
in view of (17’). It follows that 

ES? = a*(1—>p). 
We note that E(S* — 0”) = —po? and, moreover, from Problem 4.5.19 (or from (17’)) we 


note that p > —1/(n—1) so that 1 — p <n/(n—1) and hence 
0< ES? < —.o?. 
n—1 


Remark 2. In simple random sampling from a (finite ) population of size N we note that 
when n = N, X = 1, which is a constant so that (17’) reduces to 


2 
(el N-1 
0=— - 
nN Vv po’, 
so that p = —1/(N — 1). It follows that 
2 2 
= (el n—1l N-n\o 
(X) = (1 yet ) 17" 
ver) n N-1 N-l/n ae) 


The factor (VN —n)/(N — 1) in (17”) is called the finite population correction factor. As 


N > oo, with n fixed, (NV —1)/(N — 1) — 1 so that the expression for var(X) in (17”) 
approaches that in (17). 


Remark 3. In view of (17') if the X;’s are uncorrelated, that is, if p =0, then var (X) = 07/n, 
the SD of X is 7/,/n. The SD of X is sometimes called standard error (SE) although if o 
is unknown S/,/n is most commonly referred to as the SE of X. 


The following result provides a justification for our definition of sample covariance. 


Theorem 5. Let (X,Y1),(X2,¥2),.--,(Xn, Yn) be a sample from a bivariate population 


with variances ot, os and covariance po\02. Then, 


ESj=o7,  ES}=03, and ES\, = poo, (26) 


where Sr S3, and $1; are defined in (12) and (13). 
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Proof. It follows from Corollary | to Theorem 4 that ES; = ot and ES} = a. To prove 
that ES); = po;02 we note that X; is independent of X;(i 4) and Y; (i 4). We have 


(n—1)ES\; =E SK) - XH -¥) 


j=1 


Now 


= = bane ny. xX. y, 
E((X)—X)(1)-F)} = Bf XY XL RNR 
nN n n 


1 1 
= EXY — —[EXY + (n—1)EX EY] — —[EXY + (n— 1)EXEY| 
n n 
1 
+ (2 WEXY + n(n — 1)EX EY] 


n—1l 


- (EXY — EXEY), 
n 


and it follows that 
=i 
(n—1)ES), =n (=) (EXY — EX EY), 
n 


that is, 
ES, = EXY — EX EY = cov(X,Y) = poo, 
as asserted. 


We next turn our attention to the distributions of sample characteristics. Several possi- 
bilities exist. If the exact sampling distribution is required, the method of transformation 
described in Section 4.4 can be used. Sometimes the technique of MGF or CF can be 
applied. Thus, if X;,X2,...,X, is arandom sample from a population distribution for which 
the MGF exists, the MGF of the sample mean X is given by 


Mg(t) = [lee =(m(2)]’, (27) 
i=l 


n 


where M is the MGF of the population distribution. If My(t) has one of the known forms, 
it is possible to write the PDF of X. Although this method has the obvious drawback 
that it applies only to distributions for which all moments exist, we will see in Section 6.5 
its effectiveness in the important case of sampling from a normal population where this 
condition is satisfied. An analog of (27) holds for CFs without any condition on existence 
of moments. Indeed, 


n 


dx(t) = Yoel" = [6 (2)]", (8 
j=l 


where ¢ is the CF of Xj. 
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Example 2. Let X,,X2,...,X, be a sample from a G(a, 1) distribution. We will compute 
the PDF of X. We have 


i el - =o ooh 


so that X is a G(an, 1 /n) variate. 


Example 3. Let X,,X2,...,X, be a random sample from a uniform distribution on (0, 1). 


Consider the geometric mean 
7 1/n 
Yn = (11 x) . 
i=1 


We have log Y,, = (1/n) 5~"_, log X;, so that log Y,, is the mean of logX),..., log Xp. 
The common PDF of log X),...,log X;, is 


cae ‘ ifx <0, 


0 otherwise, 


which is the negative exponential distribution with parameter 6 = 1. We see that the MGF 
of log Y,, is given by 


E tlogX;/n __ 
=I . ES 


and the PDF of log Y,, is given by 


n 
r (—x)""le™, -—oo<x<0, 
fRe=<ToQ) 
0, otherwise. 
It follows that Y,, has PDF 
n” 


——y""l(—logy)""!, O<y<1, 


0, otherwise. 


Example 4. (Hogben [46]). Let X),X2,...,X, be a random sample from a Bernoulli 
distribution with parameter p, 0 < p < 1. Let X be the sample mean and S* the sample 
variance. We will find the PMF of S*. Note that S, = 7", X; = )77_, X? and that S,, is 
b(n,p). Since 


i=] 
Sn (n 7 Sn) 


n 
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S* only assumes values of the form 


i(n— i) n 
— ’ = cele s 
’ n(n— 1) pat 2 


where [x] is the largest integer < x. Thus 


P{S? = 1} = P{nS, —S? = i(n—1)} 


-*{(s.-3)'=(-9)} 


P{S, = 1 or S, =n—i} 


= (esa ors (ert 


=(F)olt-pya-py +e}, i< [5]. 


If n is even, n = 2m, say, where m > 0 is an integer, and i = m, then 


P{s ~ a} - 27") omc apy. 


In particular, if n = 7, S? = 0, 4, 4, and 2 with probabilities {p’ + (1 — p)’}, 
7p(1—p){p° + (1 —p)}, 21p?(1 —p)?{p° + (1 —p)°}, and 35p*(1 — p)’, respectively. 
If n = 6, then S? = 0, 2, 4, and * with probabilities {p° + (1 —p)°}, 6p(1 —p){p* + 


(1—p)*}, 15p?(1 —p)*{p* + (1 — p)}, and 40p3(1 — p)’, respectively. 


We have already considered the distribution of the sample quantiles in Section 4.7 and 
the distribution of range X(,) — X(1) in Example 4.7.4. It can be shown, without much 
difficulty, that the distribution of the sample median is given by 


"Fo 'h-Foro) ifr="**, gy 


£0)" Ga 
where F and f are the population DF and PDF, respectively. If n = 2m and the median is 
taken as the average of X(m) and X(m+1), then 


2(2m)! = m— m—1 
£0)= Top | Fe FO Fey—wyo)d. GO 


Example 5. Let X\,X2,...,X, be a random sample from U(0,1). Then the integrand 
in (30) is positive for the intersection of the regions 0 < 2y—v < | and0 < v < 1. This 
gives (v/2) <y < (v+ 1)/2, y < v, and 0 < v < 1. The shaded area in Fig. 2 gives the 
limits on the integral as 


y<v<2y if0<y< 
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0 1/2 1 y 
Fig.2 {y<0<2y,0<y<1/2,andy<@<1,1/2<y< Il}. 


and 


1 
y<v<l aoe le 


In particular, if m = 2, the PDF of the median, (X(2) +X(3))/2, is given by 


8y?(3 —4y) if0<y <4, 
fry) = ¢ 8(49° -9y +6y—-1) if} <y<1, 
0 otherwise. 


The method of MGF (or CF) introduced in this section is particularly effective in com- 
puting distributions of commonly used statistics in sampling from a univariate or bivariate 
normal distribution as we shall see in the next two sections. However, when sampling 
from nonnormal populations these methods may not be very fruitful in determining the 
exact distribution of the statistic under consideration. Often the statistic itself may be too 
intractable. Then we have some of other alternatives at our disposal. One may be able to 
use the asymptotic distribution of the statistic or one may resort to simulation methods. In 
Chapter 7 we study some of these procedures. 


PROBLEMS 6.3 


1. Let X),X2,...,X, be random sample from a DF F, and let F*(x) be the sample 
distribution function. Find cov(F* (x), F*(y)) for fixed real numbers x, y. 
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2. 


10. 


11. 


12. 


13. 
14. 


Let F; be the empirical DF of a random sample from DF F’. Show that 


Pf lFa(s) — FO) > 


é 
n 


1 
salen for alle > 0. 


. For the data of Example 6.2.2 compute the sample distribution function. 
. (a) Show that the sample correlation coefficient R satisfies |R| < 1 with equality if 


and only if all sample points lie on a straight line. 


(b) If we write U; = aX; +b (a 0) and V; = cY; +d (c £0), what is the sample 
correlation coefficient between the U’s and the V’s? 


. (a) Asample of size 2 is taken from the PDF f(x) = 1,0<x< 1, and = 0 otherwise. 


Find P(X > 0.9). 
(b) A sample of size 2 is taken from b(1,p): 
(i) Find P(X <p). (ii) Find P(S? > 0.5). 


. Let X,,X2,...,X, be arandom sample from N(y, ge), Compute the first four sample 


moments of X about the origin and about the mean. Also compute the first four 
sample moments of S? about the mean. 


. Derive the PDF of the median given in (29) and (30). 
. Let U1), U(2),---,U(n) be the order statistic of a sample size n from U(0,1). 


Compute EU’ a for any | <r <n and integer k (> 0). In particular, show that 
r(n—r+1) 


; 
BUea= aay at vena) = Ga Ten t2)° 


Show also that the correlation coefficient between U(,) and Us) for 1 <r<s<nis 
given by [r(n—s+1)/s(n—r+1)]!/?. 


. Let X1, X2,...,X,, be n independent observations on X. Find the sampling distribution 


of X, the sample mean, if (a) X ~ P(), (b) X ~ C(1,0), and (c) X ~ x7(m). 

Let X1,X2,...,X, be a random sample from G(a,). Let us write Y, = 

(X —aB)/B/(a/n),n=1,2,..... 

(a) Compute the first four moments of Y,,, and compare them with the first four 
moments of the standard normal distribution. 

(b) Compute the coefficients of skewness a3 and of kurtosis a4 for the RVs Y,,. (For 
definitions of a3,a4 see Problem 3.2.10.) 

Let X,,X2,...,X, be a random sample from U/[0,1]. Also let Z, = (X — 

0.5)/,/(1/12n). Repeat Problem 10 for the sequence Z,,. 

Let X;,X2,...,X, be arandom sample from P(). Find var(S”), and compare it with 

var(X). Note that EX = \ = ES”. [Hint: Use Problem 3.2.9.] 

Prove (24) and (25). 

Multiple RVs X ,Xo,...,X, are exchangeable if the n! permutations (X;,, 

X;,,...,X;,) have the same n-dimensional distribution. Consider the special case 


when X’s are two dimensional. Find an analog of Theorem 6 for exchangeable 
bivariate RVs (X,, Y,), (X2,¥2),---, (Xn, Yn): 
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15. Let X,,X2,...,X,, be arandom sample from a distribution with finite third moment. 
Show that cov(X,S”) = p3/n. 


6.4 CHI-SQUARE, t-, AND F-DISTRIBUTIONS: EXACT SAMPLING 
DISTRIBUTIONS 


In this section we investigate certain distributions that arise in sampling from a normal pop- 
ulation. Let X,,X2,...,X;, be a sample from N (ji, 07). Then we know that X ~ N(y,07/n). 
Also, {\/n(X — )/o}? is y7(1). We will determine the distribution of S? in the next 
section. Here we mainly define chi-square, t-, and F-distributions and study their prop- 
erties. Their importance will become evident in the next section and later in the testing of 
statistical hypotheses (Chapter 10). 

The first distribution of interest is the chi-square distribution, defined in Chapter 5 as a 
special case of the gamma distribution. Let n > 0 be an integer. Then G(n/2,2) is a y7(n) 
RV. In view of Theorem 5.3.29 and Corollary 2 to Theorem 5.3.4, the following result 
holds. 


Theorem 1. Let X),X2,...,X, be tid RVs, and let S, = Ppa ss Then 
(a) Sn ~ x7(n) > X1 ~ x7(1) 
and 
(b) X1~ N(0,1) > SOX ~ x7(n). 
k=1 


If X has a chi-square distribution with n d.f., we write X ~ x7 (n). We recall that, if 
X ~ y?(n), its PDF is given by 


xtt/2-1 g—x/2 ; j 
ef ye 
f@)=t PPERjZ Or” (1) 
0 if x <0, 
the MGF by 
—n/2 1 
M(t) = (1—2r) fort < 5, (2) 
and the mean and the variance by 
EX =n, var(X) = 2n. (3) 


The y(n) distribution is tabulated for values of n = 1,2,.... Tables usually go up to 
n = 30, since for n > 30 it is possible to use normal approximation. In Fig. 1 we plot the 
PDF (1) for selected values of n. 
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0 10 20 30 40 50 60 70 


Fig. 1 Chi-square densities. 


We will write 2 ., for the upper a percent point of the x(n) distribution, that is, 


P{x?(n) > Xa} =a. (4) 


Table ST3 at the end of the book gives the values of 2, for some selected values of n 
and a. 


Example 1. Let n = 25. Then, from Table ST3, 
P{y?(25) < 34.382} = 0.90. 


Let us approximate this probability using CLT. We see that Ey*(25) = 25, 
var y”(25) = 50, so that 


2(25)-25 _ 34.382-2 
P(x2(28) < 34.382} = P{ <2) 5 _ 34.38 5} 


V50 ~~ ~=—-52. 
we P{Z < 1.32} 
= 0.9066. 
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Definition 1. Let X,,X>,...,X, be independent normal RVs with EX; = ju; and var(X;) = 
o*, i= 1,2,...,n. Also, let Y = $7"_, X?/o07. The RV Y is said to be a non-central chi- 
square RV with POSEY TGC i 7/0? andnd.f. We will write Y ~ x?(n, 6), 
where 5 = )07_, 47/0”. 


Although the PDF of a y7(n,6) RV is hard to compute (see Problem 16), its MGF is 
easily evaluated. We have 


n 
M(t) = Be'=i%i/% =] Be /™, 
where X; ~ N(u;,07). Thus 


ee 2 1 t 2 -— p;)" 
Ee'*i Jo -/ exp = a Hi) dx;, 
~o OV 27 o oO 


where the integral exists for t < 5. In the integrand we complete squares, and after some 
simple algebra we obtain 


2 1 th? 1 
Ee /o — exp — ‘ t<n. 
1 o 2 


It follows that 


2 
SS —n/2 t DB; ul 
M(t) = (1—2r) on (5 2 ) t<5, (5) 


and the MGF of a x7(n,6) RV is therefore 


M() = (1-29-"exn (5 8), — (6) 


It is immediate that, if ee .,X, are independent, X; ~ x7(n;,6;), i= 1,2 


ye Xi is x? 2 (ee 161). 


The mean and variance of x7(n, 65) are easy to calculate. We have 


2t 


k, then 


py = VLEX? _ Dilvar(Xi) + (EX)"] 
7 ~ 2 


and 


var(Y) = var f i ie var(X?) 
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© fig P n 
= {Sco 60 ba Yer] 


i=1 i=l 


1 
= <q (2no* +407) 17) = 2n+4o. 


We next turn our attention to Student’s t-statistic, which arises quite naturally in 
sampling from a normal population. 


Definition 2. Let X ~ N(0,1) and Y ~ y?(n), and let X and Y be independent. Then the 
Statistic 


x 


= (7) 
JY/n 
is said to have a f-distribution with n d.f. and we write T ~ t(n). 
Theorem 2. The PDF of T defined in (7) is given by 
Piat)/2l 2) 
js eg (nek ye = t : 8 
I) = Ton fay fag bE”) 00 <f<00 (8) 


Proof. The proof is left as an exercise. 


Remark I. For n= 1, T is a Cauchy RV. We will therefore assume that n > 1. For each 
n, we have a different PDF. In Fig. 2 we plot f,,(t) for some selected values of n. Like the 


n=40 
n=20 


Fig. 2 Student’s ¢-densities. 
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t(n) 


al2 al2 


—t(n, a/2) 0 t(n, a/2) 
Fig. 3 


normal distribution, the ¢-distribution is important in the theory of statistics and hence is 
tabulated (Table ST4). 


Remark 2. The PDF f,,(t) is symmetric in t, and f,(t) + 0 as t + +o. For large n, the 
t-distribution is close to the normal distribution. Indeed, (1 + ?/n)~@+)/? et /2 as 
n—> oo. Moreover, as t > co or t ++ —o0, the tails of f(t) > 0 much more slowly than do 
the tails of the N(0, 1) PDF. Thus for small 7 and large fo 


P{|T| >to} > P{IZ| >t},  Z~N(O,1), 


that is, there is more probability in the tail of the ¢-distribution than in the tail of the 
standard normal. In what follows we will write ¢, ./2 for the value (Fig. 3) of T for which 


P{\T| > bea} =a, (9) 


In Table ST4 positive values of f,;,. are tabulated for some selected values of n and a. 
Negative values may be obtained from symmetry, ft) 1~a = —tn,a- 


Example 2. Let n = 5. Then from Table ST4, we get fs 9.925 = 2.571 and ts 9.95 = 2.015. 
The corresponding values under the (0, 1) distribution are zo 925 = 1.96 and zo 95 = 1.65. 
For n = 30, 


130,0.05 = 1.697 and 20.05 = 1.65. 
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Theorem 3. Let X ~ t(n),n > 1. Then EX’ exists for r < n. In particular, if r <n is odd, 
Ex’ =0, (10) 


and if r <n is even, 


rp all(rt Y)/210 [= 1)/2] 


TU /2)0(n/2) 


(1) 


Corollary. If n > 2, EX = 0 and EX? = var(X) =n/(n—2). 


Remark 3. If in Definition 2 we take X ~ N(p,07), ¥/o? ~ x?(n), and X and Y 
independent, 


Xx 
\/Y/n 


is said to have a noncentral t-distribution with parameter (also called noncentrality param- 
eter) 0 = /o and d.f. n. Various moments of noncentral t-distribution may be computed 
by using the fact that expectation of a product of independent RVs is the product of their 
expectations. 


We leave the reader to show (Problem 3) that, if T has a noncentral ¢t-distribution with 
nd.f. and noncentrality parameter 6, then 


n>, (12) 


and 


n(1+62) 82n Hea) Pes (13) 


T = 
ni rag ( T(n/2) 
Definition 3. Let X and Y be independent x7 RVs with m and n d.f., respectively. The RV 


_ X/m 


= 14 
Tk (14) 
is said to have an F-distribution with (m,n) d.f., and we write F ~ F(m,n). 
Theorem 4. The PDF of the F-statistic defined in (14) is given by 
I'[(m+n)/2] C) @ _ 
T(m/2)0(n/2) \n/ \n 
= —(m+n)/2 
g(f) (1+ me) . foo, (15) 


0, f <0. 
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Proof. The proof is left as an exercise. 


Remark 4. If X ~ F(m,n), then 1/X ~ F(n,m). If we take m = 1, then F = [t(n)]*, so that 
F(1,n) and t°(n) have the same distribution. It also follows that, if Z is C(1,0) [which is 
the same as t(1)], Z* is F(1,1). 


Remark 5. As usual, we write Finn», for the upper @ percent point of the F(m,n) 
distribution, that is, 


P{F(m,n) > Faaek =a, (16) 


From Remark 4, we have the following relation: 


1 
Fr mo 


(17) 


Pnaiae = 


It therefore suffices to tabulate values of F that are > 1. This is done in Table ST5, where 
values of Fin. n,q are listed for some selected values of m, n, and a. See Fig. 4 for a plot 


of g(f). 


Theorem 5. Let X ~ F(m,n). Then, for k > 0, integral, 
n )’ Dik + (m/2)|P (0/2) — 4] 


for n > 2k. (18) 


D{(m/2)P(n/2)] 


0 1 2 3 4 5 6 7 8 
Fig. 4 F densities. 
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In particular, 


n> 2, (19) 


and 
n?(2m+2n—4) 


rd n= 24) 


n>. (20) 


Proof. We have, for a positive integer k, 
oe —(m+n m 1 
if pp! (1 +f) ( a )/2 df = ea ° | eT ag te 
0 n 0 
(21) 


where we have changed the variable to x = (m/n)f[1+(m/n)f]~!. The integral in the right 
side of (21) converges for (n/2) —k > 0 and diverges for (n/2) —k < 0. We have 


Et aera) (a) (a) B(EN R34), 


as asserted. 
For k = 1, we get 


n m/2 n 
Bx=* Ea n> 2. 
Also, 
5 n\2 (m/2)|(m/2)+1 
w= (8) ei mey 
= (4) m(m-+2) . 
m/ (n—2)(n—4) 
and 


2 m(m a 
ent) = eerie (5) 
n(m+tn— 
= n>4. 


Theorem 6. If X ~ F(m,n), then Y = 1/[1+ (m/n)X] is B(n/2,m/2). Consequently, for 
each x > 0, 


If in Definition 3 we take X to be a noncentral x? RV with n d.f. and noncentrality 
parameter 6, we get a noncentral F RV. 
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Definition 4. Let X ~ y7(m,6) and Y ~ y?(n), and let X and Y be independent. Then 
the RV 


_ X/m 
~ ¥/n 


(22) 


is said to have a noncentral F-distribution with noncentrality parameter 0. 


It is shown in Problem 2 that if F has a noncentral F-distribution with (m,n) df. and 
noncentrality parameter 6, 


__ n(m+0) 
EF= m(n—2)’ n> 2 
and 
a (ae +6)? + (n—2)(m +26) >4 
var aA EP m n m ; n : 
PROBLEMS 6.4 
1. Let 


-1 x 
p= {r (5) n/a | wV/2—-w/2qy x S00, 
2 0 


Show that 


= 


2. Let X ~ F(m,n,6). Find EX and var(X). 


3. Let T be a noncentral t-statistic with n d.f. and noncentrality parameter 6. Find ET 
and var(T). 


4, Let F ~ F(m,n). Then 


Deduce that for x > 0 
m \—! 
P{F <x} = 1-P{¥< (1+ =x) \. 
n 


5. Derive the PDF of an F-statistic with (m,n) c.f. 
6. Show that the square of a noncentral t-statistic is a noncentral F-statistic. 
7. A sample of size 16 showed a variance of 5.76. Find c such that P{|X — | <c} = 


0.95, where X is the sample mean and ju is the population mean. Assume that the 
sample comes from a normal population. 
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8. A sample from a normal population produced variance 4.0. Find the size of the 
sample if the sample mean deviates from the population mean by no more than 2.0 
with a probability of at least 0.95. 


9. Let X1,X2,X3,X4,Xs5 be a sample from N(0,4). Find rs XM? > 5.751. 
10. Let X ~ y2(61). Find P{X > 50}. 


11. Let F ~ F(m,n). The random variable Z = + log F is known as Fisher's Z statistic. 
Find the PDF of Z. 


12. Prove Theorem 1. 
13. Prove Theorem 2. 
14, Prove Theorem 3. 
15. Prove Theorem 4. 


16. (a) Letf\,fo,... be PDFs with corresponding MGFs M,Mz,..., respectively. Let a; 
(0 < a; < 1) be constants such that )7;", aj = 1. Then f = )77° ajfj is a PDF 
with MGF M = Dei ajMj. 

(b) Write the MGF of a y7(n,5) RV in (6) as 


M(t) =) aM((0), 


j=0 


where M,(t) = (1 — 2r)~/+")/2 is the MGF of a y?(2) +) RV and a; = 
e~9/?(§/2)/j! is the PMF of a P(5/2) RV. Conclude that PDF of Y ~ x7(n,6) is 
the weighted sum of PDFs of y?(2j-+n) RVs, j =0,1,2,... with Poisson weights 
and hence 


noe 3 e~9/2(5/2 yCi+M/2-1 exp(—y/2) 
“- = j AQi+n)/2 7 (=) 


— 


6.5 DISTRIBUTION OF (X, 57) INSAMPLING FROM A NORMAL 
POPULATION 


Let X1,X2,...,X, be a sample from N(j,07), and write X = n~!)>7_,X; and 
S? =(n—1)~! 3™_|(X; — X)*. In this section we show that X and S? are independent 


and derive the distribution of S*. More precisely, we prove the following important result. 


Theorem 1. Let X,,X2,...,Xn be iid N(z,07) RVs. Then X and (X; —X,X2—X,..., 
X, —X) are independent. 


Proof. We compute the joint MGF of X and X; — X,X, —X,...,X, —X as follows: 


M( t,t), tz)... yt) = Eexp{tX +t (X, —X) + )(Xp-X) +--+ +t,(X, —X)} 


n n 
= Boxe] Six (s«- x 
i=] i=] 
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= thtbh+---+t,—t 
Beso 3 xi(0 ii = )| 
n 
i=1 
t—nt+t Se 
| [Toe {ee [werer=n 'S ‘ 
i=l 
[t t;—t 
=[]#o zee 
n 


=TToo{! pee) | ° Ete tn(t ap} 


n 2 rn? 


= es Hien t)| + - [t+n(t ar} 


i=1 i=1 


o 2 25° 3\2 
2 2 RnR 
(on 2 Oo =\2 
=e t+—t ——= t;—f 
exp (1 +5 Jel § Dt | 


= Mg(t)My,_x,...x,-¥(tista,--+5tn) 
= M(t,0,0,...,0)M(0,t,,.--,tn)- 


Corollary 1. X and S? are independent. 
Corollary 2. (n—1)S*/o? is x7(n—1). 


Since 
“(Xi — pb)? X- 
5 SE ss, n(=—*) ~ (1), 
i=1 


and X and S? are independent, it follows from 


Ei" (4) 6 e 


o ol 


n os 2 
E jon bs Hoe) = |. (=) +(n— ns ! 


= Eexp f (=) Eexp (n- 054 : 


that 
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that is, 
Ag 1 
(1 —2)-"/? = (1—28)-"/?E exp Caueae i= —, 


and it follows that 


2 1 


S 
Bexp |(n—1) i] = (129-092, ta 


o 
By the uniqueness of the MGF it follows that (n — 1)S?/o is y7(n— 1). 
Corollary 3. The distribution of \/n(X — p)/S is t(n— 1). 


Proof. Since \/n(X — 1)/o is N(0, 1), and (n— 1)S?/o? ~ x?(n—1) and since X and S? 
are independent, 


vin (X— p)/o _ va(X—p) 


V[(n—1)S?/o?]/(n—1) . 


is t(n— 1). 


Corollary 4. If X1,X2,...,Xm are iid N(11,07) RVs, Yi, Yo,..., Yn are iid N(p2,05) RVs, 
and the two samples are independently taken, (S7/o?)/(S3/o3) is F(m—1,n—1). If, in 
particular, 0; = 02, then S7/S3 is F(m—1,n—1). 


Corollary 5. Let X|,X2,...,Xm, and Y|, Y2,...,¥,, respectively, be independent samples 
from N(p1,07) and N(2,05). Then 


X — Y — (1 — p22) m+n—2 
{[(m—1)S{/oq] + [(n = 1)83/03]}1/7 Y of /m+o3/n 


~ t(m+n—2). 


In particular, if 0; = o2, then 


X —Y — (py — p2) mnu(m+n—2) dees 
Mn SV mtn m2) 


Corollary 5 follows since 


2. wh 

R-FAN(s— 2,42) and 
m n 
(m—1)S{ _, (n—1)S5 


~x(m+n—2 
o i o x" ( ) 


and the two statistics are independent. 
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Remark I. The converse of Corollary | also holds. See Theorem 5.3.28. 


Remark 2. In sampling from a symmetric distribution, X and S* are uncorrelated. See 
Problem 4.5.14. 


Remark 3. Alternatively, Corollary 1 could have been derived from Corollary 2 to 
Theorem 5.4.6 by using the Helmert orthogonal matrix: 


1/./n 1/\/n 1//n pe 1/,/n 

—1//2 1//2 0) San 0 

—1/V6 —1/V6 2/V6 oe 0 
A= : , : bans 0 


és 0 
—1/VYn(n—1) -1/\/n(n—1) -1/\/n(n—1) +) (n—1)/\/n(n— 1) 


For the case of n = 3 this was done in Example 4.4.6. In Problem 7 the reader is asked to 
work out the details in the general case. 


Remark 4. An analytic approach to the development of the distribution of X and S? is as 
follows. Assuming without loss of generality that X; is N(0, 1), we have as the joint PDF 
of (X1,X2, cae ,Xn) 


n 


1 2 
F (Xi Xo, 5 Xn) = (anya %P ae 


j=l 


_ 1 (n—1)s*+ nx 

= Gaye exp 7 ‘ 
Changing the variables to y,,y2,...,y, by using the transformation y;, = (x, —X)/s, we see 
that 


S20 and + =n—1. 
k=1 k=l 


It follows that two of the y,’s, say y,_; and y,, are functions of the remaining y;,. Thus 
either 


a+pB a—Bp 
Yn-1 = 7) —a 


or 


Yn-1 = 
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where 


n—2 n—2 n—2 2 
a=-> % and B= an) -25- (Fn) ; 
k=1 k=1 k=1 


We leave the reader to derive the joint PDF of (1%, Y2,...,Yn—2, X, S*), using the 
result described in Remark 2, and to show that the RVs X, S?, and (V1, Yo,..-,Yn—2) are 
independent. 


PROBLEMS 6.5 


1. Let X1,Xo,...,X, be a random sample from N(ji,07) and X and S?, respectively, 
be the sample mean and the sample variance. Let X,41 ~ N(ju,07), and assume 
that X1,X2,...,Xn,Xn41 are independent. Find the sampling distribution of [(Xn4i— 
X)/S] 

\/n/(n+ 1). 

2. Let X;,X,...,X and ¥,,Y2,..., ¥, be independent random samples from N(1;,07) 
and N(j12,07), respectively. Also, let a, 8 be two fixed real numbers. If X,Y denote 
the corresponding sample means, what is the sampling distribution of 


o(X — p11) + B(Y — 2) 
ce DE SACeD EH Ea A Br 


m+n—2 m n 


where S? and $3, respectively, denote the sample variances of the X’s and the Y’s? 

3. Let X;,X2,...,X, be a random sample from N(ju,07), and k be a positive integer. 
Find £(S”*). In particular, find E(S*) and var(S?). 

4. A random sample of 5 is taken from a normal population with mean 2.5 and variance 
or = 36, 

(a) Find the probability that the sample variance lies between 30 and 44. 
(b) Find the probability that the sample mean lies between 1.3 and 3.5, while the 
sample variance lies between 30 and 44. 

5. The mean life of a sample of 10 light bulbs was observed to be 1327 hours with a 
standard deviation of 425 hours. A second sample of 6 bulbs chosen from a different 
batch showed a mean life of 1215 hours with a standard deviation of 375 hours. If 
the means of the two batches are assumed to be same, how probable is the observed 
difference between the two sample means? 

6. Let St and Se be the sample variances from two independent samples of sizes n; = 
5 and n2 = 4 from two populations having the same unknown variance a”. Find 
(approximately) the probability that S7/53 < 1/5.2 or > 6.25. 

7. Let X,,X2,...,X, be a sample from N(y,07). By using the Helmert orthogonal 
transformation defined in Remark 3, show that X and S? are independent. 

8. Derive the joint PDF of X and S* by using the transformation described in Remark 4. 
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6.6 SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 


Let (X1,Y1), (Xo, Y2),---,;(Xn,¥,) be a sample from a bivariate normal population with 
parameters /41, /42, Pp, Gi. O3. Let us write 


and 


In this section we show that (X, Y) is independent of ($7, 511,55) and obtain the distribution 
of the sample correlation coefficient and regression coefficients (at least in the special case 
where p = 0). 


Theorem 1. The random vectors (X,Y) and (X; — X, X, —X,...,X, —X, % — Y, 


Y.—Y,...,¥, — Y) are independent. The joint distribution of (X,Y) is bivariate normal 
with parameters 11, 112, p, 07 /n, 03/n. 


Proof. The proof follows along the lines of the proof of Theorem 1. The MGF of (X,Y, 
X, —X,...,X,—X,¥, —Y,...,¥, —Y) is given by 


* 
M = M(u,V,t1, 12, +-+5fn,81582,+++,8n) 


= Boxe RHF 4 Soules) +S 004-7] 


i=1 i=] 
n u n v 
- exe] x (< +1-7) +> Y; (- +5;-3) ; 
i=1 i=1 


where #=n~! 5", 7;,5 =n! >y_, 5;. Therefore, 
n Uu v 
M* = [[zer{(F +t-#)Xi+ (- +5;-3) y;} 

n n 

n 

u v - 
= Tex { (= +4-7) bait (= +5:-5) be 

rile n n 


_oF((u/n) +1) — IP + 2pora|(u/n) +4 —A[(v/n) +513) 


cain} 


2 
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2.2 22 
u-ay +2po\o.uv+Vvo 
= exp (sant pa¥+ aad 2) 


for all real u, v, t1,f2,---,tn,51,52,-++;Sn Where M, is the MGF of (X, Y) and M is the MGF 
of (X; —X,...,X, —X,Y, — Y,...,¥, — Y). Also, M, is the MGF of a bivariate normal 
distribution. This completes the proof. 


Corollary. The sample mean vector (X,Y) is independent of the sample variance— 


2 
‘ : s Sir). ; Aaa : 
covariance matrix | |! , } in sampling from a bivariate normal population. 
S11 55 


Remark I. The result of Theorem 1 can be generalized to the case of sampling from a 
k-variate normal population. We do not propose to do so here. 


Remark 2. Unfortunately the method of proof of Theorem | does not lead to the distribu- 
tion of the variance—covariance matrix. The distribution of (X,Y, Cre ll So) was found 
by Fisher [30] and Romanovsky [92]. The general case is due to Wishart [119], who 
determined the distribution of the sample variance—covariance matrix in sampling from 
a k-dimensional normal distribution. The distribution is named after him. 


We will next compute the distribution of the sample correlation coefficient: 


R= Sao) _ Su (1) 
PEGayeLmany” 
It is convenient to introduce the so-called sample regression coefficient of Y on X 
"(X;-X)(¥;-Y) 8S S 
Byjy = SME) _ St _ p® (2) 


din Xi — XP StS 


Since we will need only the distribution of R and By\y whenever p = 0, we will make 
this simplifying assumption in what follows. The general case is computationally quite 
complicated. We refer the reader to Cramér [17] for details. 

We note that 


R= ai Y;(X; —X) 


(n— 1)S,S2 GB) 
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and 
ya aX) 
Byy = —* (4) 
. (n—1)S} 
Moreover, 
Be ot 
R- y|x*l (5) 
S3 


In the following we write B = By). 


Theorem 2. Let (X,,Y;),...,(Xn,Y,), > 2, be a sample from a bivariate normal popu- 
lation with parameters EX = p11, EY = pio, var(X) = 07, var(Y) = 04, and cov(X, Y) =0. 
In other words, let X;,X2,...,Xn be iid N(jz1,07) RVs, and ¥1,Yo,...,¥n be iid N(p12, 03) 
RVs, and suppose that the X’s and Y’s are independent. Then the PDF of R is given by 
ene (1 pyran, -1<r<1l, 
flr) =) DO) [(—2)/2] (6) 


0, otherwise; 


and the PDF of B is given by 
T(n/2) a0, ! 


o2 +02 b2)n/2’ 
(3) rng 


Proof. Without any loss of generality, we assume that jz) = 2 = 0 and o7 = 05 = 1, for 
we can always define 


hy(b) = oo<b<om. (7) 


X,-— yY,;— 
ca | and Y= moma (8) 
O71 02 


Now note that the conditional distribution of Y;, given X,,X2,...,X,, is N(0,1), and Y,, 
Yo,...,Yn, given X1,X2,...,X,, are mutually independent. Let us define the following 
orthogonal transformation: 


= > cay, ie el (or (9) 


aay) 


j=1,2,...,n, (10) 


j=l,2,...,n. (11) 
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It follows from orthogonality that for any i > 2 


n n 1 n 
s cj=Vn5 cima vd cyc1j = 0 
j=l j=l j=l 
and 
n n n n 
N29) =~ = =x 
di =D | Dei dD cary 
i=1 i=1 \ j=l j=l 
n n n 
=) S S CyCit | ViVi 
j=l j=l \i=l 
n 
_ 2 
= doy 
j=l 
Moreover, 
uy = ny 
and 


wp = by (xi —)?, 


where b is a value assumed by RV B. Also U;, U2,..., Un, given X1,X2,. 
RVs (being linear combinations of the Y’s). Thus 


n 
E{U; | X1,Xo,...,Xn} = _ cyE{¥; | X1,X2,...,Xn} 
j=l 


=0 


and 


n n 
cov{ Uj, Uy | X1,X2,---,Xn} =cov 4 S_cy¥j, ¥_ cep | X1,X2,--- 


j=l p=! 
non 
= ) ) 67C;, Cov{Y;,Y, | X1,Xa,. 
j=l p=l 
n 
= een 


This last equality follows since 


0, JAD, 


cov Y MMi) =f =o 
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(12) 


(13) 


(14) 


(5) 


..,Xp, are normal 


Xn 


.. Xn} 


(16) 
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From orthogonality, we have 


0, ixk 
cor UU Xie) =f Le (17) 


and it follows that the RVs U;, U2,...,Un, given X1,X2,...,Xn, are mutually independent 
N(0, 1). Now 


= : (18) 


Thus 


R2 


= n — n . (19) 
ies U? U3 ae 3 U? 


Writing U = U3 and W = 9~"_, U?, we see that the conditional distribution of U, given X;, 
X,...,Xn, is x7(1), and that of W, given X1,X2,...,Xn, is x7(n — 2). Moreover U and 
W are independent. Since these conditional distributions do not involve the X’s, we see 
that U and W are unconditionally independent with \7(1) and \7(n — 2) distributions, 
respectively. The joint PDF of U and W is 


1 1/2-1 ,—u/2 1 (n—2)/2-—1 ,—w/2 
u é Ww é . 
T(5)v2 T'[(n—2)/2]20-2)/2 


f(u,w) = 


Let u+w =z, then w= r?zand w = z(1—r’). The Jacobian of this transformation is z, so 
that the joint PDF of R? and Z is given by 


1 


n/2—3/2 4-2/2 21/24 __ -2)n/2-2. 
TT —2/qa0-* e (r) (P77) 


fr 2 


The marginal PDF of R? is easily computed as 


#(p2) — Pi(n—1)/2] r —1/ —y n/2— r 
f= Tore-aAa” ee ea aS (20) 


Finally, using Theorem 2.5.4, we get the PDF of R as 


P'[(n— 1)/2] (1 y2\n/2—2 


AO = Sayre D/y | 
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As for the distribution of B, note that the conditional PDF of Uz = ./n—1BS}, given X,, 
Xo,...,Xn, is N(0,1), so that the conditional PDF of B, given X),X,...,Xn, is N(0,1/ 
>> (x; —X)*). Let us write A = (n— 1)S7. Then the PDF of RV A is that of a y7(n—1) RV. 
Thus the joint PDF of B and A is given by 


where g(b | \) is N(O, 1/2), and h2(X) is x7(n— 1). We have 


hy (b) = [eran 


1 of 2 
= y/2-1 —A/2(1+b ) dr 
eee, : 
Pee) : oa 2h < 6a, (22) 


~ PG) 1/2) 0+ By?’ 
To complete the proof let us write 
X;=p~i+Xfo, and Y;=p.+¥/'o2, 


where X* ~ N(0, 1) and Y¥ ~ N(0,1). Then X; ~ N(t1,07), ¥i ~ N(u2, 03), and 


=h", (23) 


so that the PDF of R is the same as derived above. Also 


_ 0102 dia (XF —X\(¥} =¥, 
of (Xp XP 


L 


= —B", (24) 


) 


B 


where the PDF of B* is given by (22). Relations (22) and (24) are used to find the PDF of 
B. We leave the reader to carry out these simple details. 


Remark 3. In view of (23), namely the invariance of R under translation and (positive) 
scale changes, we note that for fixed n the sampling distribution of R, under p = 0, does 
not depend on ju, 42,01, and a2. In the general case when p 4 0, one can show that for 
fixed n the distribution of R depends only on p but not on ju, 42,01, and a2 (see, for 
example, Cramér [17], p. 398). 


Remark 4. Let us change the variable to 


R 
V1—-R 


T= 


Vn—2. (25) 
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Then 


and the PDF of T is given by 
Gs 1 1 1 
Pw a= 2 Bi(n—2)/2, 5] I+ P/(@— 20-972? 
which is the PDF of a f-statistic with n — 2 d.f. Thus T defined by (25) has a t(n — 2) 


distribution, provided that p = 0. This result facilitates the computation of probabilities 
under the PDF of R when p = 0. 


(26) 


Remark 5. To compute the PDF of Byjy = R(S;/S2), the so-called sample regression 
coefficient of X on Y, all we need to do is to interchange o; and 02 in (7). 


Remark 6. From (7) we can compute the mean and variance of B. For n > 2, clearly 
EB=0, 


and for n > 3, we can show that 


1 


EB’ = var(B) = os 
atn—3 


Similarly, we can use (6) to compute the mean and variance of R. We have, for n > 4, 
under p = 0, 
ER=0 


and 


PROBLEMS 6.6 


1. Let (X1, 1), (X2,¥2),..-,(Xn, Y,) be a random sample from a bivariate normal pop- 
ulation with EX = 1, EY = pz, var(X) = var(Y) = 07, and cov(X,Y) = po’. Let 
X,Y denote the corresponding sample means, are the corresponding sample vari- 
ances, and S;, the sample covariance. Write R = 2S; /(S{ +3). Show that the PDF 
of R is given by 


r(5) 2 —1)/2 —(n—1 2 —3)/2 
F(A) = SY (l= 07) PP = pr) @ VP-P YOO? |r <1. 
Val (4+) 


(Rastogi [89]) 
[Hint: Let U = (X + Y)/2 and V = (X — Y)/2, and observe that the random vector 
(U,V) is also bivariate normal. In fact, U and V are independent.] 
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2. Let X and Y be independent normal RVs. A sample of n = 11 observations on (X, Y) 
produces sample correlation coefficient r = 0.40. Find the probability of obtaining 
a value of R that exceeds the observed value. 

3. Let X1,X2 be jointly normally distributed with zero means, unit variances, and cor- 
relation coefficient p. Let S be a y7(m) RV that is independent of (X,,X2). Then 
the joint distribution of ¥; = X,/\/S/n and Y) = X2/./S/n is known as a central 
bivariate t-distribution. Find the joint PDF of (Y, Y2) and the marginal PDFs of Y, 
and Y>, respectively. 

4. Let (X1,Y,),...,(X,,¥,) be a sample from a bivariate normal distribution with 
parameters EX; = 1, EY; = pz, var(X;) = var(¥;) = 0”, and cov(X;, ¥;) = po’, 
i= 1,2,...,n. Find the distribution of the statistic 


(X=m)—W=p) 
 (i-Y-X+Y7) 


ix Vea 
V 


BASIC ASYMPTOTICS: LARGE SAMPLE 
THEORY 


7.1. INTRODUCTION 


In Chapter 6 we described some methods of finding exact distributions of sample statistics 
and their moments. While these methods are used in some cases such as sampling from a 
normal population when the sample statistic of interest is X or S”, often either the statistics 
of interest, say T, = T(X1,...,X;), is either too complicated or its exact distribution is not 
simple to work with. In such cases we are interested in the convergence properties of 
T,. We want to know what happens when the sample size is large. What is the limiting 
distribution of 7,,? When the exact distribution of 7;, (and its moments) is unknown or too 
complicated we will often use their asymptotic approximations when n is large. 

In this chapter, we discuss some basic elements of statistical asymptotics. In Section 7.2 
we discuss various modes of convergence of a sequence of random variables. In 
Sections 7.3 and 7.4 the laws of large numbers are discussed. Section 7.5 deals with 
limiting moment generating functions and in Section 7.6 we discuss one of the most fun- 
damental theorem of classical statistics called the central limit theorem. In Section 7.7 we 
consider some statistical applications of these methods. 

The reader may find some parts of this chapter a bit difficult on first reading. Such a 
discussion has been indicated with a’. 


7.2 MODES OF CONVERGENCE 


In this section we consider several modes of convergence and investigate their interrela- 
tionships. We begin with the weakest mode of convergence. 
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Definition 1. Let {F,,} be a sequence of distribution functions. If there exists a DF F such 
that, as n + oo, 


F,,(x) > F(x) (1) 


at every point x at which F is continuous, we say that F’, converges in law (or, weakly), to 
F, and we write F,, —> F. 
If {X,,} is a sequence of RVs and {F,,} is the corresponding sequence of DFs, we say 
that X,, converges in distribution (or law) to X if there exists an RV X with DF F such that 
w : L 
F, — F. We write X, > X. 


It must be remembered that it is quite possible for a given sequence DFs to converge 
to a function that is not a DF. 


Example 1. Consider the sequence of DFs 


F, (x) = fi Xx<n, 


x>n. 


Here F,,(x) is the DF of the RV X,, degenerate at x =n. We see that F,,(x) converges to a 
function F that is identically equal to 0, and hence it is not a DF. 


Example 2. Let X,X2,...,X;, be iid RVs with common density function 


1 
eee 0<x<0, (0<0<oo), 


O otherwise. 


Let Xn) = max(X1,X2,...,X;,). Then the density function of X(,) is 


n—1 


0<x <8, 
Inlx) = gn 
0 otherwise, 
and the DF of X(,) is 
0 x <0, 
F,,(x) = ¢ (x/0)" 0<x <9, 
1 x>é@ 


We see that, as 1 — 00, 


which is a DF. Thus F,, —> F. 
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Example 3. Let F,, be a sequence of DFs defined by 


0, x<10; 
1 
F,,(x) = 1--, O<x<a, 
n 
LL, n<x. 


Clearly F, “; F, where F is the DF given by 


FG) = 0. «<0. 
1, x>0. 
Note that F,, is the DF of the RV X,, with PMF 
1 1 
P{X, =O} =1--, P{X, =n} =-, 
n n 


and F is the DF of the RV X degenerate at 0. We have 


: 1 
EX* =nk (*) = ne, 
n 


where k is a positive integer. Also EX* = 0. So that 
EX" —» EX* for any k > 1. 


We next give an example to show that weak convergence of distribution functions does 
not imply the convergence of corresponding PMF’s or PDF’s. 


Example 4. Let {X,,} be a sequence of RVs with PMF 


1 ifx=2+1/n, 
QO otherwise. 


In(X) _ P{Xp =x} = 
Note that none of the f;,’s assigns any probability to the point x = 2. It follows that 
Sn(x) +f(%) as n—00, 
where f(x) = 0 for all x. However, the sequence of DFs {F,,} of RVs X,, converges to the 


function 


1 x22, 


F(x) = tt x <2, 
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at all continuity points of F. Since F is the DF of the RV degenerate at x = 2, F, > F. 

The following result is easy to prove. 
Theorem 1. Let X,, be a sequence of integer-valued RVs. Also, let f,(k) = P{X, =k}, 
k =0,1,2,..., be the PMF of X,, n = 1,2,..., and f(k) = P{x = k} be the PMF of X. 
Then 

fils) of(x) — forallx > X,-5X. 
In the continuous case we state the following result of Scheffé [100] without proof. 


Theorem 2. Let X,,1 = 1,2,..., and X be continuous RVs such that 


Salx) > f(x) for (almost) all x as n + oo. 


Here, f, and f are the PDFs of X,, and X, respectively. Then X,, = X. 
The following result is easy to establish. 


Theorem 3. Let {X,,} be a sequence of RVs such that X,, = X, and let c be a constant. 
Then 


(a) Ke SX te, 
(b) cX, > cX,c £0. 


A slightly stronger concept of convergence is defined by convergence in probability. 


Definition 2. Let {X,,} be a sequence of RVs defined on some probability space (Q,5, P). 
We say that the sequence {X,,} converges in probability to the RV X if, for every c > 0. 


P{|X, —X|>e} 0 as n — 00. (2) 
: P 
We write X,, > X. 


Remark I, We emphasize that the definition says nothing about the convergence of the 
RVs X,, to the RV X in the sense in which it is understood in real analysis. Thus X,, a 
X does not imply that, given ¢ > 0, we can find an N such that |X, — X| < ¢ forn > 
N. Definition 2 speaks only of the convergence of the sequence of probabilities P{|X,, — 
X|>e} to 0. 


Example 5. Let {X,} be a sequence of RVs with PMF 


1 1 
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Then 


1 
P{X,=1}=- if0<e<1, 
n 

ife> 1. 


P{\X,,| >e} = 


It follows that P{|X,| > ¢} — 0 as n + 00, and we conclude that X, “+ 0. 


The following statements can be verified. 


1. X, S¥SX,-¥ 50. 
2. X_ +X, X, 7 ¥ > P{X = Y} =1, for P{|X—¥| >c} < P{|X, —X| > §}+ 
P{|X, — Y| > $}, and it follows that P{|X — Y| > c} = 0 for every c > 0. 


P P 
3. X, > X => X, —Xm 2 Oas n,m — ov, for 


E E 
P{|Xn Xml > €} <P [Xn x|> =} P{ Xn x|> =}. 


MENS Ob EN ge Oe os Ee ee 
Xn 4 X, k constant, => kX, eae 2 
X, k= X27 #2. 


P P P 
X, — a, Y, + b, a, b constants > X,Y, — ab, for 


Y. 


a Gy oy 


2 _y)2 2/7, p)2 
XN i — (Xn + ¥n) 4 (Xn Yn) a (a+b) 4 (a b) = ab. 


8. X, > 1 > Xz! 441, for 


and each of the three terms on the right goes to 0 as n + oo. 
9. X, a, Y, > b, a, b constants, b 40 > ee da Fy abe, 


10. X, - X, and Yan RV > X,Y 7 XY. 
Note that Y is an RV so that, given 5 > 0, there exists ak > 0 such that P{|Y| > k} 
< 6/2. Thus 


P{|X,¥ —XY| > e} = P{|X, —X||¥| >e,|¥| > k} 
+ P{|Xn —X||Y| > €,|Y| <k} 


é6 E 
7 P{[Xn—X -\. 
<5t | Pann 
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P P P 
11. X, 3 X,Y, 9 YS XY, 3 XY, for 
P 
(X,—X)(Yn—Y) 50. 


The result now follows on multiplication, using result 10. It also follows that 
X, > X => X2 *; x2, 


Theorem 4, Let X, “> X and g be acontinuous function defined on . Then g(X,) za g(X) 
as nN — 00. 


Proof. Since X is an RV, we can, given ¢ > 0, find a constant k = k(e) such that 
€ 
P{|X| >k} < 7 


Also, g is continuous on &, so that g is uniformly continuous on [—k,k]. It follows that 
there exists a 0 = 0(¢,k) such that 


|g) — g(a)| <e 
whenever |x| < & and |x, —x| < 6. Let 
A={|X|<k}, B= {|Xn—X| <5}, C= {|8(Xn) — 8(X)| < ¢}. 
Then w € ANB=>w €C, so that 
ANBCC. 
It follows that 
P{C} < PLA} + PLB}, 
that is, 
Pi |g(Xn) — 8(X)| 2 e} S P{|Xn—X| > O}+ PIX] > kp <e 
for n > N(e,6,k), where N(e,6,k) is chosen so that 


PAIX, —X| 20} <5 for n>N(c,6,k). 


P ; 
Corollary. X, — c, where c is a constant = g(X;) £, g(c), g being a continuous function. 


We remark that a more general result than Theorem 4 is true and state it without proof 
(see Rao [88, p. 124]): X, +, X and g continuous on R => g(X,,) zy g(X). 

The following two theorems explain the relationship between weak convergence and 
convergence in probability. 
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P L 

Theorem 5. X, > X > X, > X. 

Proof. Let F,, and F, respectively, be the DFs of X,, and X. We have 


fw: X(w) <x} = {w: X,(w) <x,X(w) <x} U fw: X,(w) > x, 
Xe) Ke} C1, Saux, Sak <r). 


It follows that 
F(x’) < F(x) + P{X, > x,X <x'}. 
Since X, —X Zz 0, we have for x’ <x 
P{X, >x,X <x/} < P{|X, —X| >x-x'} 0 as n—+ 00. 
Therefore 


F(x’) < lim F,(x), a oe 


noo 


Similarly, by interchanging X and X,,, and x and x’, we get 


lim F(x) < F(x”), ie ae 


noo 


Thus, for x’ <x < x”, we have 
F(x’) < lim F;, (x) < lim F, (x) = F(x") 


Since F has only a countable number of discontinuity points, we choose x to be a point of 
continuity of F, and letting x” | x and x’ + x, we have 


F(x) = im F,, (x) 


at all points of continuity of F. 


Theorem 6. Let k be a constant. Then 


X, »k>X, ok. 


Proof. The proof is left as an exercise. 


Corollary. Let k be a constant. Then 


X, >kox, Sk. 
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Remark 2. We emphasize that we cannot improve the above result by replacing k by an 


RY, that is, X, 4, X in general does not imply X,, 4 X, for let X,X1,X2... be identically 
distributed RVs, and let the joint distribution of (X,,X) be as follows: 


x. x.) 0-1 
0 eae 
1 7 O|5 

7 7|1 


Clearly, X,, = X. But 


1 
P{y-X1> 5} =PUlk,-X1= 1} 


= P{X, =0,X = 1} + P{X, = 1,x = 0} 
=1+0. 


Hence, X;, 4, X, but X, za X. 


Remark 3. Example 3 shows that X,, *, X does not imply EX‘ — EX* for any k > 0, k 
integral. 


Definition 3. Let {X,,} be a sequence of RVs such that E|X,,|" < co, for some r > 0. We 
say that X,, converges in the rth mean to an RV X if E|X|" < co and 


E|X, —X|" 30 as no, (3) 
and we write X, > X, 


Example 6. Let {X,} be a sequence of RVs defined by 


1 1 
P{X, =O} =1--, P{X, = 1} =-, n= 1h 205s 
n nN 


Then 

EIK,|?=— +0 as n—> 00, 
and we see that X,, * X, where RV X is degenerate at 0. 
Theorem 7. Let X,, ~ X for some r > 0. Then Xn 2 X. 
Proof. The proof is left as an exercise. 
Example 7. Let {X,} be a sequence of RVs defined by 

1 1 


P{X, =0} =1-—— P{X, =n} = — r>0, n=1,2,.... 


n’ n 
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Then £|X,,|" = 1, so that X, + 0. We show that X,, ZG, 


P{X, =n} if e<n 


Pim >e= 4" if > \ 004m, 
1 E n 


Theorem 8. Let {X,,} be a sequence of RVs such that X, 2, X. Then EX, — EX and 
EX? — EX? asn— oo. 
Proof. We have 
|E(X, —X)| < E|X, —X|< E'/?|x,-X/7 30 as no. 
To see that EX? — EX? (see also Theorem 9), we write 
EX? = E(X, —X)? + EX’ + 2E{X(X, —X)} 
and note that 
E{X(X%—X)}| < VEXE(X, —X) 

by the Cauchy—Schwarz inequality. The result follows on passing to the limits. 

We get, in addition, that X,, 2k implies var(X,,) > var(X). 


Corollary. Let {X,,}, {Y,} be two sequences of RVs such that X,, Xx ae 2, Y. Then 
E(XmYn) + E(XY) as m,n > oo. 


Proof. The proof is left to the reader. 


As a simple consequence of Theorem 8 and its corollary we see that X;, 22 Vn 2,Y 
together imply cov(X;,, Y,) > cov(X,Y). 


Theorem 9. If X,, > X, then E|X,,|" > E|X|’. 
Proof. LetO<r< 1. Then 
E|X,,|" = E|X, —X+X|" 
so that 
E|X;|" — E|X|" < E|X, —X|’. 
Interchanging X, and X, we get 


E|X|" — EX, |" < E|X, —X|". 
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It follows that 
|E|X|" — E|X,,|"| < E|X, —X|" 3 0 as n—> oo. 

For r > 1, we use Minkowski’s inequality and obtain 

[EIXA\"'" < (EX, —X\"]'" + [E|XI"]'” 
and 

[EIX|]'"" < [E|Xn — XY" + [EX]. 
It follows that 

|E'/"|x, |" —E'/"|x|'|< E/"|x, -X/" 30 as noo. 
This completes the proof. 
Theorem 10. Let r > s. Then X, > X > X, > X. 
Proof. From Theorem 3.4.3 it follows that for s <r 
E\X, —X|° < [E|X,-—X|"}/" 30 asn—-co 

since X;, 3X. 


Remark 4. Clearly the converse to Theorem 10 cannot hold, since E|X|* < co for s <r 
does not imply E|X|" < oo. 


Remark 5. In view of Theorem 9, it follows that X,, —> X => E|X,,|° > E|X|§ for s <r. 


Definition 4.‘ Let {X,,} be a sequence of RVs. We say that X,, converges almost surely 
(a.s.) to an RV X if and only if 


P{w: X,(w) > X(w) = asn>ocof=1, (4) 
and we write X,, ——> X or X, > X with probability 1. 
The following result elucidates Definition 4. 


Theorem 11. X, —> X if and only if lim, oo P{SuP)+, |Xm — X| > €} = 0 for all e > 0. 


Proof. Since X,, oes, X, X,—X tse 0, and it will be sufficient to show the equiva- 
lence of 


* May be omitted on the first reading. 
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(a) X, > 0 and 
(b) limp oo P{SUP yn |Xm| > EF = 0. 


Let us suppose that (a) holds. Let ¢ > 0, and write 


An(e) = {sup Xin| > :} and C= { lim X, = of. 
n—- oo 


m>n 


Also write B,(¢) = CMA,(e), and note that B,+)(€) C B,(e), and the limit set 
N° B,(€) = ¢. It follows that 


lim PBn(€) -°{ Amc} =0. 


Since PC = 1, PCS = 0, we have 


PB,(e) — P(A, NC) = 1— P(C°UAS) 
= 1— PC — PAS +P(C°NAS) 
= PA, + P(C°NAS) 
= PAy,. 


It follows that (b) holds. 
Conversely, let lim, +4, PAn(€) = 0, and write 


D(e) = { lim |%,| ><> 0}. 
noo 
Since D(e) C A, (e) forn = 1,2,..., it follows that PD(<) = 0. Also, 


C= { lim Xn #0} GC U {i > zt. 


k=1 


so that 
= 1 
1-—PC< PD\{|-)= 
<yopp(z) =9, 
k=1 
and (a) holds. 


Remark 6. Thus X,, =*; 0 means that, for ¢ > 0, ” > 0 arbitrary, we can find an no such 
that 


P {sup \X,| > :| <7. (5) 


n>n0 


Indeed, we can write, equivalently, that 


lim P 
NQ—+ 00 


LU {ml > ) = 0. (6) 


n>no 
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a.s. P 
Theorem 12. X,, —> X > X, > X. 


Proof, By Remark 6, X, ——> X implies that, for arbitrary ¢ > 0, 7 > 0, we can choose an 
Ng = No(E,7) such that 


P 


Atm! <a] > 1-7. 


n=No 


Clearly, 
() {Xn -X| Se} C{|Xn—X| Se} for n>n9. 


n=no 


It follows that for n > no 


P{|X, —X| <e} =P 


() {|Xn —X| <a] 21-n, 
n=no 


that is 


P{|X,-X|>e}<n for n>n09, 
which is the same as saying X,, ae 


That the converse of Theorem 12 does not hold is shown in the following example. 


Example 8. For each positive integer n there exist integers m and k (uniquely determined) 
such that 


n=2 +m, O<m<2, k=0,1,2,.... 
Thus, forn = 1,k =0 and m= 0; forn = 5, k = 2 and m = 1; and so on. Define RVs X,, 
forn = 1,2,..., on Q = [0,1] by 


m ao! 
5a) ne an a 


0, otherwise. 


Let the probability distribution of X,, be given by P{J} = length of the interval J C 2. 
Thus 


1 1 
Ok? P{Xn =O} = 1— 5. 
The limit lim,_,.. X,(w) does not exist for any w € Q, so that X,, does not converge almost 
surely. But 


Pi =2") = 


0 if e>2, 
PiX,| > ef} =PiXn > ef = 1 
{%nl } { } 5K if O<e<2', 
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and we see that 


P{|X,| > ce} 0 as n (and hence k) — oo. 


Theorem 13. Let {X,,} be a strictly decreasing sequence of positive RVs, and suppose 
that X, > 0. Then X, “=> 0. 


Proof. The proof is left as an exercise. 


Example 9. Let {X,,} be a sequence of independent RVs defined by 
1 


1 
P{X, =O}=1-_, P{X,=1}= 
{Xp =O}=1-=, P(X, =1}=— 


Then 
2 2_ 1 
E|X, —O|" = E|X, |" = - 30 as n— ov, 
n 
so that X,, = 0. Also 


P{X, =0 for every m <n < no} 


a 1 m—1 
=I (1-+)= No : 


n=m 


which diverges to 0 as ng — 00 for all values of m. Thus X,, does not converge to 0 with 
probability 1. 


Example 10. Let {X,,} be independent defined by 


1 1 
P{X, =O} =1-—, P{X, =n} =—, r>2, n=1,2,.... 
n' n' 


Then 


n' 


no 1 
P{X, =0 form<n<n}=]|[ (1-=). 


n=m 


As no — oo, the infinite product converges to some nonzero quantity, which itself 
converges to 1 as m — oo. Thus X, ~~» 0. However, E|X,,|" = 1 and X, + 0 asn—oo. 


Example 11. Let {X,,} be a sequence of RVs with P{X, = +1/n} = 5. Then E|X,|" = 
1/n" + 0asn— co and X, - 0. Forj < k, |X;| > |X;|, so that {|X;| > e} C {|Xj| > e}. It 
follows that 


{1X > ©} = {Xn > ef. 


j=n 
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Choosing n > 1 /¢, we see that 
co 
1 
P|(J{IXj| >} | = P{Xal > e} < P{ . \ =0, 
j=n nm 
and (6) implies that X,, ~~» 0. 


Remark 7. In Theorem 7.4.3 we prove a result which is sometimes useful in proving a.s. 
convergence of a sequence of RVs. 


Theorem 14. Let {X,,, Y,}, 1 =1,2,..., be a sequence of RVs. Then 
[X,—Y,|—>0 and Y, —»>Y=>X, > Y. 
Proof. Let x be a point of continuity of the DF of Y and ¢ > 0. Then 


P{X,, < x} =P{Y, <x+ YA 
= P{Y, <x+¥,—Xnj¥n—X, <€} 
+ PLY, <x+Y,—Xnj Vn —Xn > €} 
< PLY, <x+te}+P{Y, —X, >}. 


It follows that 


lim P{X, <x} < lim P{Y, <x+e}. 
n—-oo 


n->co 


Similarly 


lim P{X, aps lim PLY, <x-—e}. 
n—-0co 


noo 


Since ¢ > 0 is arbitrary and x is a continuity point of P{Y < x}, we get the result by 
letting « — 0. 


Corollary. X,, Kes Xi ae 


Theorem 15. (Slutsky’s Theorem). Let {X,,, Y,}, 2 = 1,2,..., be a sequence of pairs of 
RYs, and let c be a constant. Then 


@ % Sx. y Ses Rs Sete 
(b) =e, 
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L ; 
Yes rae ifc 40, 
XnY, 30 ifc=0; 


x 
() %5XY,5es = 4, X/cifc#0 


n 


Proof. (a) Xn ~> X => X,+c “+ X +c (Theorem 3). Also, ¥, —¢ = (Yn +Xn) — (Xn +c) 
*0. 
A simple use of Theorem 14 shows that 


GLY Sven 
(b) We first consider the case where c = 0. We have, for any fixed number k > 0, 
€ € 
P{|XnY,| > e} = P{IXnYa| > €,|Yal < =} + PY |Xn¥al > €,|Y,| = - } 
€ 
< PEK) > KE + PL [Fal > FY. 
Since Y,, = 0 and X, = X, it follows that, for any fixed k > 0, 


lim P{|Xn¥n| > e} < P{|X| > k}. 


n—-co 


Since k is arbitrary, we can make P{|X| > k} as small as we please by choosing k 
large. It follows that 


XY, 0, 
Now, let c 4 0. Then 


XnY — cXp = Xn(Yn _ c) 


and, since X, a me Zs c, Xn(Yn—c) 230: Using Theorem 14, we get the result 
that 


X,¥q —> cX. 


(c) Y, > c, and c t= , e-!. It follows that X, 4 X,Y, 5c= 
X,¥~! “+ c“1X, and the proof of the theorem is complete. 


As an application of Theorem 15 we present the following example. 
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Example 12. Let X,,X2,..., be iid RVs with common law (0,1). We shall determine 
the limiting distribution of the RV 

X, +X. +++-+Xp 

XP 4X5 +--+ X2° 


W, =n 


Let us write 


1 XP 4X2 4. 4K? 
= (X,+X.+---+X,) and V, = ek’ maha ne 
Jn n 


Then 


For the MGF of U,, we have 


n n 
2 
Mu,(t) =T[ ees =] e* 
i=1 i=l 


so that U,, is an N(0,1) variate (see also Corollary 2 to Theorem 5.3.22). It follows that 


U; = Z, where Z is an N(0, 1) RV. As for V,,, we note that each x is a chi-square variate 
with 1| d.f. Thus 


which is the MGF of a gamma variate with parameters a = n/2 and 3 = 2/n. Thus the 
density function of V, is given by 
1 1 


fils) = 4 Few /2) Gay 
mi otherwise. 


We le-Re. Qe x <'ee, 


We will show that V,, = 1. We have, for any ¢ > 0, 


2 

n 2 1 

P{|V—1| >e} < = — (5) ( ) a +0 as nov. 
n 


We have thus shown that 


wZ and V,—4+1, 


It follows by Theorem 15 (c) that W,, = U,/V, ban Z, where Z is an N(0, 1) RV. 


MODES OF CONVERGENCE 301 


Later on we will see that the condition that the X;’s be N(0, 1) is not needed. All we 
need is that E|X;|* < oo. 


PROBLEMS 7.2 


1. 


Let X;,X2,... be a sequence of RVs with corresponding DFs given by F,,(x) = 0 if 
x < =n, = (x+n)/2nif —n <x <n, and = 1 if x >n. Does F,, converge to a DF? 


. Let X1,X2... be iid N(0, 1) RVs. Consider the sequence of RVs {X,,}, where X;, = 


n—!S~_, X;. Let F, be the DF of X,,n = 1,2,.... Find lim,_,o0 F(x). Is this limit 
a DF? 


. Let X;,Xp,... be iid U(0,0) RVs. Let X(;) = min(X),X2,--- ,X,), and consider the 


sequence Y,, = nX,,). Does Y,, converge in distribution to some RV Y? If so, find 
the DF of RV Y. 


- Let X;,X2,... be iid RVs with common absolutely continuous DF F. Let X(,) = 


max(X,X2,.-.,Xn), and consider the sequence of RVs Y, = n[1 — F(X(,))]. Find 
the limiting DF of Y,,. 


. Let X1,X2,... be a sequence of iid RVs with common PDF f(x) = e~**? if x > 0, 


and = 0 if x < 6. Write X, =n7!S~7_, Xi. 
(a) Show that X, > 1+. 
(b) Show that min{X),X2,--- Xn} — 6. 


. Let X;,X2,... be iid U[0, 6] RVs. Show that max{X,,Xo,...,X,} 0. 


7. Let {X,,} be a sequence of RVs such that X,, 4X. Let a, be a sequence of positive 


10. 


11. 


12. 
13. 


P 
constants such that a, —> oo as n — oo. Show that a, 'x, 30. 


. Let {X,} be a sequence of RVs such that P{|X,| < k} = 1 for all n and some 


constant k > 0. Suppose that X,, 4 X. Show that X, - X for any r > 0. 


. Let X1,Xo,...,X2, be iid N(0,1) RVs. Define 


X X3 X —1 2 
Uy = {E+E eee a \ V.=8 ee. and 


Find the limiting distribution of Z,,. 
Let {X,} be a sequence of geometric RVs with parameter A/n, n > A > 0. Also, 
let Z, = X,,/n. Show that Z,, = G(1,1/A) asn > co (Prochaska [82]). 


Let X,, be a sequence of RVs such that X,, =* 0, and let c, be a sequence of real 

numbers such that c, — 0 as n > oo. Show that X, +c, ——> 0. 

Does convergence almost surely imply convergence of moments? 

Let X,,X2,... be a sequence of iid RVs with common DF F, and write Xn) = 

max{X),X,...,X,},n=1,2,.... 

(a) For a > 0, limy+o.x*P{X, > x} = b > 0. Find the limiting distribution 
of (bn)—!/ “X(n). Also, find the PDF corresponding to the limiting DF and 
compute its moments. 
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14. 


15. 
16. 
17. 
18. 
19. 


20. 
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(b) If F satisfies 


lim e*{1—F(x)] =b>0, 


xX— 00 


find the limiting DF of X(,,) — log(bn) and compute the corresponding PDF and 
the MGF. 


(c) If X; is bounded above by xo with probability 1, and for some a > 0 
lim (x) —x) °[1—F(x)] =b>0, 
XX 
find the limiting distribution of (bn)!/ “{X(n) — Xo}, the corresponding PDF, 
and the moments of the limiting distribution. 


(The above remarkable result, due to Gnedenko [36], exhausts all limiting 
distributions of X(,) with suitable norming and centering.) 


Let {F,,} be a sequence of DFs that converges weakly to a DF F which is continuous 
everywhere. Show that F;,(x) converges to F(x) uniformly. 


Prove Theorem 1. 

Prove Theorem 6. 

Prove Theorem 13. 

Prove Corollary 1 to Theorem 8. 


Let V be the class of all random variables defined on a probability space with finite 
expectations, and for X € V define 


p(x) Ef LY. 
1+ |X| 
Show the following: 


(a) p(X +Y) < p(X) + a(¥); p(oX) < max(|o],1)p(X). 

(b) d(X,Y) = p(X —Y) is a distance function on V (assuming that we identify RVs 
that are a.s. equal). 

(c) limyoo d(Xn,X) =0 XX. 

For the following sequences of RVs {X,,}, investigate convergence in probability 

and convergence in rth mean. 

(a) X, ~ C(1/n,0). 

(b) P(X, =e")=4, P(X, =0)=1-4. 


NV 


7.3 WEAK LAW OF LARGE NUMBERS 


Let {X,,} be a sequence of RVs. Write S, = >>¢_, Xx, n = 1,2,.... In this section we 
answer the following question in the affirmative: Do there exist sequences of constants A, 
and B, > 0, By, —+ co as n —> ov, such that the sequence of RVs B, 7S —A,) converges 
in probability to 0 as n + co? 
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Definition 1. Let {X,,} be a sequence of RVs, and let S, = )77_, X;,n = 1,2,.... We say 
that {X,,} obeys the weak law of large numbers (WLLN) with respect to the sequence of 
constants {B,}, B, > 0, B, t co, if there exists a sequence of real constants A, such that 


P ; ; 
B* (S, —An) — O0asn— co. A, are called centering constants and B,, norming constants. 


Theorem 1. Let {X,,} be a sequence of pairwise uncorrelated RVs with EX; = ju; and 
var(X;) =0?,i=1,2,.... If )7_, 0? + 00 as n > 00, we can choose Ay = )>y_; fx and 
By, =}, 0, that ts; 


n n ji 7 . : 
P(e Somi>eDio = HE 


i=1 77 


1 
—=, 7 - 0 asn—- om. 
a) 

7 a1 77 


Corollary 1. Ifthe X,,’s are identically distributed and pairwise uncorrelated with EX; = ju 


and var(X;) = 0? < 00, we can choose A, = np: and B, = no”. 


Corollary 2. In Theorem 1 we can choose B, = n, provided that n~? >, 07 + 0 as 
n> oo. 


Corollary 3. In Corollary 1, we can take A, = nj and B, =n, since no?/n? — 0 as 
n — oo. Thus, if {X,,} are pairwise-uncorrelated identically distributed RVs with finite 


: P 
variance, S,,/n — ,. 


Example 1. Let X,,X2,... be iid RVs with common law b(1,p). Then EX; = p, var(X;) = 
p(1—p), and we have 


——>p as nN — OO. 


Note that S,,/n is the proportion of successes in n trials. In particular, recall from Section 
6.3 that n F* (x) is a b(x, F(x)) RV. It follows that for each x € R, 


F* (x) 7} F(x) as n —> 00. 


Hereafter, we shall be interested mainly in the case where B,, = n. When we say that 
{X,,} obeys the WLLN, this is so with respect to the sequence {n}. 


Theorem 2. Let {X,} be any sequence of RVs. Write Y, =n7! ei X;,. A necessary and 
sufficient condition for the sequence {X,,} to satisfy the weak law of large numbers is that 


y2 
E{ fa} 70 asn—> oo. (1) 
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Proof. For any two positive numbers a, b, a > b > 0, we have 


a 1+) 
(4) (=) 21. (2) 


Let A= {|Y,,| >e}. Then w € A => |Y,|* > €? > 0. Using (2), we see that w € A implies 


Y? 1+é 
14+¥2 6 — 


It follows that 


ie e 
PA<P > 
eee 


\y2/(1+Y?)| 
<p by Markov’s inequalit 
Se 2 e) y q y 
> 0 as n— oo. 
That is, 
P 
Y, 30 as n — Oo. 


Conversely, we will show that for every c > 0 


Y 
Ply, >e}28{ \_ 2. 3) 


1+Y2 


We will prove (3) for the case in which Y,, is of the continuous type. The discrete case 
being similar, we ask the reader to complete the proof. If Y,, has PDF f,,(y), then 


[lizpnoe=| f+ f ) ran 


Iyl>e lylSe 
<PLYnl > ekt 1— n 
<P(inl>e}+ f (1p — 
Ee 5 
< PAiy,| > a < PA\Yn| > Ee, 
< P{|¥,| > e} a2 1 |a|. 6} 


which is (3). 


Remark I. Since condition (1) applies not to the individual variables but to their sum, The- 
orem 2 is of limited use. We note, however, that all weak laws of large numbers obtained 
as corollaries to Theorem | follow easily from Theorem 2 (Problem 6). 
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Example 2. Let (X,,X2,...,X,) be jointly normal with EX; = 0, EX? = 1 for all i, and 
cov(X;,X;) = p if |j— i] = 1, and = 0 otherwise. Then S,, = )>_, Xx is N(0,07), where 


o” = var(S,) =n+2(n—1)p, 


and 


ae | Rte 
— | S y’[n+2(n—1)p] 
fon n?+y?[n+2(n—1) pl. 
n+2(n—1)p a. 2 2 yp 
< e yle 
Ss z ; a> 


ey /2 dy 


dy >0 as Nn — OO. 
n 


It follows from Theorem 2 that n~!S, = 0. We invite the reader to compare this result to 
that of Problem 7.5.6. 


Example 3. Let X;,X2,... be iid C(1,0) RVs. We have seen (corollary to Theorem 5.3.18) 
that n—!S, ~ €(1,0), so that n—'S,, does not converge in probability to 0. It follows that 
the WLLN does not hold (see also Problem 10). 


Let X1,X2,... be an arbitrary sequence of RVs, and let S, = yaink n= 1,2, 0%. Let 
us truncate each X; at c > 0, that is, let 


. |X if |X| < 
x= ; | [se =1,2,. jl. 

0 if |X;| SG 
Write 

= 5° X{,and m, = S— EX. 

i=1 i=1 
Lemma 1. For any < > 0, 
P{|Sy—my| > €} < P{|S5— my] > ef} + S— P{[Xe| > ch. (4) 


k=1 


Proof. We have 


P{|S,,—my| > €} = P{|S,—m,| >¢ and |X,|<c fork =1,2,...,n} 
+P{|S,—m,| > and |X;| >c for at least one k 
k 


=1,2,...,n 
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< P{|S° —m,| >e}+P{|X;,| >c for at least one k, 
1<k<n} 
< P{ |S; —my| > e} + 5° P{|Xe| > ce 
k=1 
Corollary. If X,,X2,...,X, are exchangeable, then 
P{|Sy—my| > e} < P{|S6 —m,| > e}+nP{|Xi| > c}. (5) 
If, in addition, the RVs X1,X2,...,X, are independent, then 
E(X¢ 2 
PLIS,— mal > e} <A” + nPixi| > o}. (6) 


Inequality (6) yields the following important theorem. 


Theorem 3. Let {X,,} be a sequence of iid RVs with common finite mean ju = EX,. Then 
nS, > asin — Oo. 
Proof. Let us take c =n in (6) and replace € by ne; then we have 
1 n\2 
P{|S,—m,| > ne} < ene (x) +nP{|Xi| >n}, 


where X7 is X; truncated at n. 
First note that E|X,| < co = nP{|X\| >n} — 0 as n > oo. Now (see remarks following 
Lemma 3.2.1) 


E(x")? = 2 | xP{|X)| > x}dx 
0 
A n 
=2 (/ +/ ) aP tix >x}dx, 
0 A 
where A is chosen sufficiently large that 
6 . 
xP{|X)| > x} < 5 for all x > A,6 > 0 arbitrary. 
Thus 
E(x?) < c+ [ dx <c+nd, 
A 


where c is a constant. It follows that 
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and since 6 is arbitrary, (1/ne?)E(X7)* can be made arbitrarily small for sufficiently large 
n. The proof is now completed by the simple observation that, since EX; = j1, 


We emphasize that in Theorem 3 we require only that E|X,| < co; nothing is said about 
the variance. Theorem 3 is due to Khintchine. 


Example 4. Let X,X2,... be iid RVs with E|X,|* < oo for some positive integer k. Then 
n xk 
> ey Ee as nN — OO. 
n 
j=l 


Thus, if EX? < oo, then >} X?/n ”, EX?, and since (Se Xj/n) “, (EX,)? it follows 
that 


DX? (EX) \* 
z ( i) *, var(X1). 
n n 


Example 5. Let X,X2,... be iid RVs with common PDF 


1+6 
=, 4S 

$Ghas er =" Fe 
0, x<l 


Then 


| 


1+6 
=—— <0, 


6 


and the law of large numbers holds, that is, 


| P 1+6 
nS, —— as n — OOo. 


} 


PROBLEMS 7.3 


1. Let X;,X,... be a sequence of iid RVs with common uniform distribution on (0, 1]. 
Also, let Z, = ([]_, Xi)!/" be the geometric mean of X1,X2,...,Xn,n = 1,2,.... 


’ 


P : ' 
Show that Z,, — c, where c is some constant. Find c. 


308 


10. 


11. 


BASIC ASYMPTOTICS: LARGE SAMPLE THEORY 


. Let X,,X2,... be iid RVs with finite second moment. Let 


n 


2 
Ye Sa Pe 
n(n+1) a7 


i=l 


Show that Y,, 25 EX ie 


. Let X,,X2,... be a sequence of iid RVs with EX; = y and var(X;) = 07. Let S; = 


ae X;. Does the sequence S; obey the WLLN in the sense of Definition 1? If so, 
find the centering and the norming constants. 


. Let {X,} be a sequence of RVs for which var(X,) < C for all n and pj = 


cov(X;,X;) — 0 as |i—j| + co. Show that the WLLN holds. 


. For the following sequences of independent RVs does the WLLN hold? 


(a) P{X, = +h) — 5. 

(b) P{X, = +k} = 1/2Vk, P{X, = 0} = 1— (1/V&). 
CO Py H=27) a2 Pika 0H 1 = 12), 
(d) P{X, =+1/k} =1/2. 

(ce) P{X, =+Vk} =F. 


. Let X,X2,... be a sequence of independent RVs such that var (X,) < co for 


k =1,2,..., and (1/n”) )>y_, var(X,) > 0 as n — oo. Prove the WLLN, using 
Theorem 2. 


. Let X, be a sequence of RVs with common finite variance 0”. Suppose that the 


correlation coefficient between X; and X; is < 0 for all i 4 j. Show that the WLLN 
holds for the sequence {X, }. 


. Let {X,} be a sequence of RVs such that X; is independent of X; for j #k+ 1 or 


j#k-—1. If var(X;) < C for all k, where C is some constant, the WLLN holds 
for {X;}. 


. For any sequence of RVs {X,,} show that 


P -lo P 
max |X;| > 0>n7'S, > 0. 
1<k<n 


Let X;,X>,... be iid C(1,0) RVs. Use Theorem 2 to show that the weak law of large 
numbers does not hold. That is, show that 

2 
n?+S82 


n 
+0 as n — oo, where S, = y Xen =1,2ye04: 
k=1 


E 


Let {X,,} be a sequence of iid RVs with P{X, > 0} = 1. Let S, = =X) n= 


1,2,.... Suppose {a,} is a sequence of constants such that a, nS. ea 1. Show that 
(a) dy 2 ooasn—>co and (b) dy41/a, > 1. 


7.4 STRONG LAW OF LARGE NUMBERS? 


In this section we obtain a stronger form of the law of large numbers discussed in 
Section 7.3. Let X1,X2,... be a sequence of RVs defined on some probability space 
(9,8, P). 


t This section may be omitted on the first reading. 
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Definition 1. We say that the sequence {X,,} obeys the strong law of large numbers 
(SLLN) with respect to the norming constants {B,,} if there exists a sequence of (centering) 
constants {A,,} such that 


By'(Sp—An) +0 asn—> oo. (1) 
Here B, > 0 and B, — co asn > cw. 


We will obtain sufficient conditions for a sequence {X,,} to obey the SLLN. In what fol- 
lows, we will be interested mainly in the case B, =n. Indeed, when we speak of the SLLN 
we will assume that we are speaking of the norming constants B, = n, unless specified 
otherwise. 

We start with the Borel—Cantelli lemma. Let {Aj} be any sequence of events in 8. We 
recall that 


Jim A, = lim, Ua = \ Ua (2) 


We will write A = lim,_,..A,. Note that A is the event that infinitely many of the A, occur. 
We will sometimes write 


PA=P ( lim An) = P(Ani.0.), 
nN—-oo 


where “i.o.” stands for “infinitely often.” In view of Theorem 7.2.11 and Remark 7.2.6 we 
have X, “~> 0 if and only if P{|X,| > € i.o.} = 0 for all e > 0. 


Theorem 1 (Borel—Cantelli Lemma). 


(a) Let {A,,} be a sequence of events such that ee PA, < co. Then PA = 0. 


(b) If {A,} is an independent sequence of events such that >, PA, = oo, then 
PA=1. 


Proof. 


(a) PA = P(limy-s00 US2,, Ak) = lity soc P(US2,, At) < lity soo 0, PAg = 0. 
(b) We have A® =, (2, AG, so that 


foe) Co 
Cc . c\)_ 1: c 
soe (2 a") eeu (A) : 
k=n k=n 
For no > n, we see that ()°,, Ag C i, At so that 


(As) same((s) =n fiom 


k=n k=n 
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because {A,} is an independent sequence of events. Now we use the elementary 
inequality 


no no no 
1—exp -S oa; < 1—]]G-a,) <Si ay, no >n,l >a; > 0, 
j=n jen j=n 
to conclude that 
oe) no 
c < : = . 
P (n a) < pase exp ( Su) 
i) =n 


Since the series yi PA, diverges, it follows that PAS = 0 or PA = 1. 
Corollary. Let {A,,} be a sequence of independent events. Then PA is either 0 or 1. 


The corollary follows since }*~ , PA, either converges or diverges. 
As a simple application of the Borel—Cantelli lemma, we obtain a version of the SLLN. 


Theorem 2. If X),X2,... are iid RVs with common mean yp and finite fourth moment, 
then 


at lim t= yh=1. 
noo n 
Proof. We have 


E{d(X; — p)}4 =nE(X, — p)* +6 (5) ge <r. 


By Markov’s inequality 


‘| 


n 


wea —p) 


1 


= (ne)! = (nee 


n = 4 2 7 
> ne « HIS WM. Cr _C 
Therefore, 


S P{|Sn—yn| > ne} <0, 


n=1 


and it follows by the Borel—Cantelli lemma that with probability 1 only finitely many of 
the events {w: |(S,,/n) — u| > €} occur, that is, PAs = 0, where 


S 
A- = lim sup {|= 4] >eh. 
n—0oo n 
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The sets A- increase, as ¢ —> 0, to the w set on which S,,/n + ju. Letting ¢ + 0 through a 
countable set of values, we have 


Hern} 


Corollary. If X|,X2,... are iid RVs such that P{|X,| < K}= 1 for all n, where K is a 
positive constant, then n—!S, "> yu. 


Theorem 3. Let X),X2,... be a sequence of independent RVs. Then 


Xn + 0 S > P{[Xn| > e} <0 for alle > 0. 


n=1 


Proof. Writing A, = {|X,| ><}, we see that {A,,} is a sequence of independent events. 
Since X, ——> 0, X, > 0 ona set E* with PE =0. A point w € E* belongs only to a finite 
number of A,,. It follows that 


lim supA, CE, 
n—-oo 


hence, P(A, i.o.) = 0. By the Borel-Cantelli lemma (Theorem 1(b)) we must have 
oe PAn < 00. (Otherwise, 3+ , PAn = 00, and then P(A, i.o.) = 1.) 


n=1 
In the other direction, let 


1 
Aye = limsup > 7} ; 
n—-co k 
and use the argument in the proof of Theorem 2. 


Example 1. We take an application of Borel—Cantelli Lemma to prove a.s. convergence. 
Let {X,,} have PMF 


Then P(|X,| > €) = <y and it follows that 


Co 


a 1 
SP (x| >= ia <oo fora>l. 


n=1 n=1 


Thus from Borel—Cantelli lemma P(A,, io.) = 0, where A, = {|X,,| > ¢}. Now using the 
argument in the proof of Theorem 2 we can show that P(X, “4 0} =0. 


We next prove some important lemmas that we will need subsequently. 
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Lemma 1 (Kolmogorov’s Inequality). Let X,, X>,...,X,, be independent RVs with com- 


mon mean 0 and variances 07, k = 1,2,...,n, respectively. Then for any ¢ > 0 
n o 
P»> max || >e? < =. 3 
{max a ~ d, e2 GB) 


Proof. Let Aj =Q, 


and 


= Ay MAS 
= {|Si| <e,...,|Sr-1] < e} 2 {at least one of |S;], 
= {|S\| <e,...,|Se—-1| <€,|Sy| > e}. 


.,|Sz| is > e} 


It follows that 
A, => Bi 
k=1 
and 


By C {|Se—1| < €,|Sz| > €}. 


As usual, let us write Jg,, for the indicator function of the event B,. Then 


E(Snp,)° _ E{(Sn _ Sx)IB, + Sil}, 
—= E{(Sn = Sx)" I, + S2Ip, + 28x (Sn = Sx Ip, }- 


Since S, — Sy = Xx41 +-:-+X, and S;Jg, are independent, and EX; = 0 for all k, it follows 
that 


E(Sylp,)° _ E{(Sn ~~ Sx )IB, a + E(Slp,)° 
> E(Sylp,)? > €?PBx. 


The last inequality follows from the fact that, in By, 


Sx| > €. Moreover, 


S > E(Snlp,)° = E(Selac) < E(Sz) = 


so that 


n n 
Soop =e? SPB, = €’ P(A‘), 
1 1 


as asserted. 
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Corollary. Take n = | then 
2 
os 
P{|X\|>e}< = 
which is Chebychev’s inequality. 


ee) 
n=1 


Lemma 2 (Kronecker Lemma). If 5>~ , x, converges to s (finite) and b, + co, then 
by! S > byxe > 0. 
k=l 
Proof. Writing bo = 0, ag = by — dg_1, and 5y41 = Do¢_) Xe, We have 


1 1 n 
be i = iy DPu(see — 5k) 


1 n 1 n 
sat Ga + Yh] -* So des 
n 1 n kl 


It therefore suffices to show that b, : ae 1 aS~ — S. Since s, — $, there exists an no = 
no(e) such that 


E 
[Sp —s| < 5 forn > no. 


Since b, ¢ 00, let n; be an integer > ng such that 


No 


b. S) (be — be-1)(8% — 8) <5 forn > ny. 
1 


Writing 


n 


rn = by! S © (be = be-1) 8x5 


we see that 
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and, choosing n > n;, we have 


n 


x: (bx — bi-1) 5 


k=no+1 


kes sl < > 


n 


Es 


: So (bk — bi—1) (8% — 8) 


This completes the proof. 
Theorem 4. If )>° , varX;, < 00, then }>~~ , (X, — EX,) converges almost surely. 


n=1 


Proof. Without loss of generality assume that EX, = 0. By Kolmogorov’s inequality 


1 n 
Pf max Sink — Sin| = :} < ze d_,var(Xnet) 


—Sin| ><} 


1 [oe} 
a > var(X;,). 


k=m+1 


Letting n — 00 we have 


II 
v 
——— 
= 
5 
» 
al 
Ace 


P = 5s 
{mex Sas Sm| a :} 


It follows that 


lim {max |S.— Sy < :| =1, 
k>m 


m—->oo 
and since € > 0 is arbitrary we have 
00 
Py ain [Sox] =o 1 
j=m 
Consequently, a1 X; converges a.s. 


As a corollary we get a version of the SLLN for nonidentically distributed RVs which 
subsumes Theorem 2. 


Corollary 1. Let {X,,} be independent RVs. If 


2 
k=1 Bi 
then 
Sn _ ES, a.s. 
——— —>0. 
By 


The corollary follows from Theorem 4 and the Kronecker lemma. 
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Corollary 2. Every sequence {X,,} of independent RVs with uniformly bounded vari- 
ances obeys the SLLN. 


If var(X;,) <A for all k, and B, = k, then 
SF cay p<oo 
k=l 


and it follows that 
Sn ax ES), a.s. 
—— —> 
n 


0. 


Corollary 3 (Borel’s Strong Law of Large Numbers). For a sequence of Bernoulli tri- 
als with (constant) probability p of success, the SLLN holds (with B, =n and A, =np). 


Since 


EX,=p,  var(X,)=p(l-p)<-, 0<p<l, 


Ale 


the result follows from Corollary 2. 


Corollary 4. Let {X,,} be iid RVs with common mean y and finite variance 0”. Then 


Sn 
at lim — =n} il, 
noo n 


Remark I. Kolmogorov’s SLLN is much stronger than Corollaries | and 4 to Theorem 4. 
It states that if {X,,} is a sequence of iid RVs then 


nS) > p= > E|X| < 00, 


and then pp = EX,. The proof requires more work and will not be given here. We refer the 
reader to Billingsley [6], Chung [15], Feller [26], or Laha and Rohatgi [58]. 


PROBLEMS 7.4 


1. For the following sequences of independent RVs does the SLLN hold? 
(a) P{X, = £2"} = 5. 
(b) P{X, =+k}= ove P{X, = 0} = 1-(1/Vk). 
(co) Pig = 22") = 1/27", Pi = 0} = 1= (172). 
2. Let X;,X2,... be a sequence of independent RVs with )7 7°, var(X;)/k? < 00. Show 
that 


1 n 
— > var(X;) +0 as Nn — oo. 
n 


Does the converse also hold? 
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3. For what values of a does the SLLN hold for the sequence 


1 
P{X, = +k} = 5 ? 


4. Let {of} be a sequence of real numbers such that )°7° , of /k? = oo. Show that there 
exists a sequence of independent RVs {X;} with var(X;,) =07,k = 1,2,..., such that 
n—!S~i_, (X_ — EX,) does not converge to 0 almost surely. 

[Hint: Let P{X, = +k} = 02 /2k*, P{X;, =0} = 1—(o7/k) if ox/k < 1, and P{X, = 

tox} = 3 if o,/k > 1. Apply the Borel-Cantelli lemma to {|X,,| > 7}.] 

5. Let X,, be a sequence of iid RVs with E|X,,| = +oo. Show that, for every positive 
number A, P{|X,| > 7A i.o.} = 1 and P{|S,,| < nA i.o.} = 1. 


6. Construct an example to show that the converse of Theorem 1(a) does not hold. 


7. Investigate a.s. convergence of {X,,} to 0 in each case. 
(a) P(X, =e") =1/n’, P(X, =0) =1—1/r’. 
(b) P(X, =0) =1-—1/n, P(X, = +1) = 1/(2n). 
(X,,’s are independent in each case.) 


7.5 LIMITING MOMENT GENERATING FUNCTIONS 


Let X1,X2,... be a sequence of RVs. Let F,, be the DF of X,, n = 1,2,..., and suppose 
that the MGF M,,(t) of F;, exists. What happens to M,,(t) as n + oo? If it converges, does 
it always converge to an MGF? 


Example 1. Let {X,,} be a sequence of RVs with PMF P{X,, = —n} = 1,n=1,2,.... We 
have 


M,(t)= Ee =e"30 asn—-oo forallt>0, 
M,,(t) + +00 forallt<0, andM,(t) > 1 att=0. 


Thus 
0, %t¢t>0 
M,(t) > M(th=4$1, t=0 asn>oo. 
oo, [<0 


But M(t) is not an MGF. Note that if F,, is the DF of X,, then 


0 ifx<— 
FQxy=e. U*S "4 FG) =1 forall x, 
1 ifx>-—n 


and F is not a DF. 
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Next suppose that X,, has MGF M,, and X,, 2 X, where X is an RV with MGF M. Does 
M,,(t) + M(t) as n — 00? The answer to this question is in the negative. 


Example 2. (Curtiss [19]). Consider the DF 


0, x<—n, 
F,,(x) = $+c,tan~'(nx), —n<x<n, 
1 x>n 


| 9 


where c, = 1/[2tan~!(n7)]. Clearly, as n > 00, 


0, x<0, 
1, x>0, 


F(x) > F(x) = 


at all points of continuity of the DF F. The MGF associated with F,, is 


n 
. n 
mlt)= font raat 


=A 


which exists for all t. The MGF corresponding to F is M(t) = | for all t. But M,(t) + M(t), 
since M,,(t) + 00 if t £0. Indeed 


© ler oe 
M,,(t) > ‘ Cn lane 


The following result is a weaker version of the continuity theorem due to Lévy and 
Cramér. We refer the reader to Lukacs [69, p. 47], or Curtiss [19], for details of the proof. 


Theorem 1 (Continuity Theorem). Let {F,,} be a sequence of DFs with corresponding 
MGFs {@,,}, and suppose that M,,(t) exists for |t| < fo for every n. If there exists a DF 
F with corresponding MGF M which exists for |f] < t; < fo, such that M,,(t) > M(t) as 
n— oo for every t € [—t),t)], then F, > F. 


Example 3. Let X,, be an RV with PMF 


! 1 
PiXn = 1p = and P{Xn =O} = 1——.. 


Then M,,(t) = (1/n)e' +[1 — (1/n)] exists for all t € R, and M,,(t) > 1 as n > oo for all t. 
Here M(t) = 1 is the MGF of an RV X degenerate at 0. Thus X,, =o a 


Remark I. The following notation on orders of magnitude is quite useful. We write x, = 
o(rn) if, given € > 0, there exists an N such that |x,,/7r,| < ¢ for alln > N and x, = O(r,) 
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if there exists an N and a constant c > 0, such that |x,/r,| < c for all n > N. We write 
Xn = O(1) to express the fact that x, is bounded for large n, and x, = o(1) to mean that 
Xn, > Oasn—- oo. 

This notation is extended to RVs in an obvious manner. Thus 
Xn = 0,(rn) if, for every « > 0 and 6 > 0, there exists an N such that P(|X,,/Tn| <6) > 
1—e forn > N, and X, = O,(r,) if, for ¢ > 0, there exists a c > 0 and an N such that 


P(|Xn/Tn| <c) > 1—e. We write X,, = 0,(1) to mean X, —’. 0. This notation can be easily 
extended to the case where r, itself is an RV. 


The following lemma is quite useful in applications of Theorem 1. 


Lemma 1. Let us write f(x) = o(x), if f(x)/x + 0 as x + 0. We have 


1 n 
lim {i+2+0(2) | =e for every real a. 
n—-0o n n 


Proof. By Taylor’s expansion we have 


F(x) =f (0) +f" (0x) 
=f (0) +3f"(0) + {f'(0x) -—f"(0)}x,  0< O<1. 


If f’(x) is continuous at x = 0, then as x + 0 
F(x) =f(0) +4f"(0) +0(3). 
Taking f(x) = log(1 +x), we have f’ (x) = (1+x)~!, which is continuous at x = 0, so that 
log(1 +x) =x+o(x). 


Then for sufficiently large n 


ne +hro(a)Fanito(a) Le roC)]} 


It follows that 


{1+2+0()} = toll) 
n n 


Example 4. Let X;,X2,... be iid b(1,p) RVs. Also, let S, = pee and let M,,(t) be the 
MGEF of S,,. Then 


as asserted. 


M,,(t) = (q+pe')" for all f, 
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where g = | —p. If we let n + oo in such a way that np remains constant at A, say, then, 
by Lemma 1, 


my) = (1-246) ={1+4¢-n} — exp{A(e’—1)} for all ¢, 


which is the MGF of a P(A) RV. Thus, the binomial distribution function approaches the 
Poisson df, provided that n + oo in such a way that np = \ > 0. 


Example 5. Let X ~ P(X). The MGF of X is given by 
M(t) = exp{X(e' — 1)} for all t. 


Let ¥Y = (X— \)/WX. Then the MGF of ¥ is given by 


My(t) =e M (=). 


Also, 


log My(t) = —tV+logM (=) 
= —tV/r+X(e/¥ — 1) 


t t ia 
es ie 
van (= at 3nunt ) 
vr p 
“2 3ee 
It follows that 
2 
log My(t) > . as + 00, 


so that My(t) > e’ /2 as ) + 00, which is the MGF of an N(0, 1) RV. 
For more examples see Section 7.6. 


Remark 2. As pointed out earlier working with MGFs has the disadvantage that the exis- 
tence of MGFs is a very strong condition. Working with CFs which always exist, on the 
other hand, permits a much wider application of the continuity theorem. Let ¢, be the CF 
of F,,. Then F,, —> F if and only if d, 4 ¢ as n— oo on R, where ¢ is continuous at 
t = 0. In this case @, the limit function, is the CF of the limit DF F. 
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Example 6. Let X be a €(0,1) RV. Then its CF is given by 


1° t 1 f© sint 
Eexp(itX) = C8 dx ti sme dx 
T Jog 14x? T Jing 1+x? 


1 f° costx 
— = dx = el 


T 


ae 


since the second integral on the right side vanishes. 
Let {X,} be iid RVs with common law £(X) and set Y, = 4 X;/n. Then the CF of 
Y, is given by 


n n 
: t 
n(t) = Eexp ¢ it ) Xj/n p= | [exw{-4} 
j=l j=l 
= exp(—lel) 


for all n. It follows vy, is the CF of a C(1,0) RV. We could not have derived this result 
using MGFs. Also if U, = ei X;/n® for a > 1, then 


gUn(t) = exp {—|t|/n*~"} > 1 


as n — oo for all t. Since y(t) = 1 is continuous at t = 0, y is the CF of the limit 


DF F. Clearly F is the DF of an RV degenerate at 0. Thus }97_, X;/n® =*, U, where 
PU=0j=1; 


PROBLEMS 7.5 


1. Let X ~ NB(r;p). Show that 
2pX SF as p — 0, 


where Y ~ x?(2r). 

2. Let X, ~ NB(rn;1—pny),n = 1,2,.... Show that X, “. X as Tn — ©O, Pn — 0, in such 
a way that rp, — A, where X ~ P(A). 

3. Let X;,X2,... be independent RVs with PMF given by P{X, =+1} = 5, REA bien 
Let Z, = )07_, X;/2/. Show that Z, +, Z, where Z ~ U[-1, 1]. 

4, Let {X,} be a sequence of RVs with X, ~ G(n,3) where 3 > 0 is a constant 
(independent of n). Find the limiting distribution of X,,/n. 

5. Let X, ~ x?(n),n = 1,2,.... Find the limiting distribution of X,,/n’. 

6. Let X),X2,...,Xn be jointly normal with EX; = 0, EX? = | for alli and cov(X;,X;) = 
p, i,j = 1,2,... (i #j). What is the limiting distribution of n~'S,, where S, = 

eye! 
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Let X,,X2,... be a sequence of RVs, and let S$, = yi X;,,n = 1,2,.... In Sections 7.3 
and 7.4 we investigated the convergence of the sequence of RVs BS (S, — An) to 
the degenerate RV. In this section we examine the convergence of B, 1S, —A,) toa 
nondegenerate RV. Suppose that, for a suitable choice of constants A, and B,, > 0, the RVs 
By! (Sy —An) +, Y. What are the properties of this limit RV Y? The question as posed is 
far too general and is not of much interest unless the RVs X; are suitably restricted. For 
example, if we take X, with DF F and X2,X3,... to be 0 with probability 1, choosing A, = 0 
and B,, = 1 leads to F as the limit DF. 

We recall (Example 7.5.6) that, if X,,X2,...,X, are iid RVs with common law C(1,0), 
then n—'S, is also (1,0). Again, if X,,X,...,X, are iid N(0,1) RVs then n~!/25, is 
also N(0,1) (Corollary 2 to Theorem 5.3.22). We note thus that for certain sequences of 
RVs there exist sequences A, and B, > 0, B, — 00, such that B;'(S,, — An) +, Y. In the 
Cauchy case B, =n, A, = 0, and in the normal case B, = ni/ 2A, = 0. Moreover, we see 
that Cauchy and normal distributions appear as limiting distributions—in these two cases, 
because of the reproductive nature of the distributions. Cauchy and normal distributions 
are examples of stable distributions. 


Definition 1. Let X;, X2, be iid nondegenerate RVs with common DF F. Let aj, a be any 
positive constants. We say that F is stable if there exist constants A and B (depending on 
dy, a) such that the RV B~!(a,X, +a)X2— A) also has the DF F. 


Let X1,X2,... be iid RVs with common DF F. We remark without proof (see Loéve [66, 
p. 339]) that only stable distributions occur as limits. To make this statement more precise 
we make the following definition. 


Definition 2. Let X,,X2,... be iid RVs with common DF F. We say that F belongs to 
the domain of attraction of a distribution V if there exist norming constants B, > 0 and 
centering constants A,, such that, as n — oo, 


P{B,' (Sn —An) Sx} > V(x), (1) 
at all continuity points x of V. 


In view of the statement after Definition 1, we see that only stable distributions possess 
domains of attraction. From Definition | we also note that each stable law belongs to its 
own domain of attraction. The study of stable distributions is beyond the scope of this 
book. We shall restrict ourselves to seeking conditions under which the limit law V is the 
normal distribution. The importance of the normal distribution in statistics is due largely 
to the fact that a wide class of distributions F belongs to the domain of attraction of the 
normal law. Let us consider some examples. 


Example 1. Let X,,X2,...,Xy be iid b(1,p) RVs. Let 


S,= SOX, A, = ES, = np, B,= s/ var (Sn) a /np(1 —p). 
k=1 
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ee ee al eA ee 
=e atone ams|} ee 


-{ve0(- fg) 9e0()} 


psee()]- 


It follows from Lemma 7.5.1 that 


M,(t) 3 ef? as nN — 00, 


and since e“/? is the MGF of an N(0, 1) RV, we have by the continuity theorem 


Sn —np } 1 / * Pp 
P <xp e /*dt for all x E RX. 
{ Vnpq af 26 I 00 


In particular, we note that for each x € R, F* (x) F(x) as n > 00 and 


ValFn (x) —FO)) 2 
F(x)(1— F(x) 


>Z asn—-> oO, 


where Z is N(0, 1). It is possible to make a probability statement simultaneously for all x. 
This is the so-called Glivenko—Cantelli theorem: F(x) converges uniformly to F(x). For 
a proof, we refer to Fisz [31, p. 391]. 


Example 2. Let X;,X2,...,Xn be iid y7(1) RVs. Then S,, ~ x7(n), ES, =n, and var(S,) = 
2n. Also let Z, = (S, —n)/V 2n then 


M,,(t) = Ee’ 


=e ( n/2) (1 ) t ot < Vin, 
sfo(ie)-im(V9]". i 


Using Taylor’s approximation, we get 


os (nf2) <14n/2s (\2) exp (On) ( a. 
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where 0 < 0, < t,/(2/n). It follows that 


2 —n/2 
M,(t) = (1-5 +52) ’ 
n n 
where 
2 Pp /2 24 
¢(n) = ye ve exp(0,) > 0 as nN — 00, 
n 3 Vn 3n 


for every fixed t. We have from Lemma | that M,,(t) > e/? as n > oo for all real ¢, and 


it follows that Z,, = Z, where Z is N(0, 1). 


These examples suggest that if we take iid RVs with finite variance and take A, = ES), 
Bn = ./ var(S,), then By !(S,—An) zy Z, where Z is N(0, 1). This is the central limit result, 
which we now prove. The reader should note that in both Examples | and 2 we used more 
than just the existence of E|X|*. Indeed, the MGF exists and hence moments of all order 
exist. The existence of MGF is not a necessary condition. 


Theorem 1 (Lindeberg—Lévy Central Limit Theorem). Let {X,,} be a sequence of iid 
RVs with 0 < var(X,) = 0? < oo and common mean p. Let 5, = =X n=1,2,.... 
Then for every x € R 


: Sa—np 2 X-—p 1 * =u /2 
Jim { av/n <x} Jim P{ <x} =| 


Proof: The proof we give here assumes that the MGF of X,, exists. Without loss of gen- 
erality, we also assume that EX, = 0 and var(X,,) = 1. Let M be the MGF of X,,. Then the 
MGF of S,/./7 is given by 


M,,(t) = Eexp(tS,/V/n) = [M(t//n)|" 
and 
én M,(t) =n én M(t/V/n) = én M(t/,/n) 


1/n 
_ Lit/V) 
1/n ? 


where L(t/,/n) = (n M(t/,/n). Clearly L(0) = €n(1) = 0, so that as n + oo, the conditions 
for L’ Hospital’s rule are satisfied. It follows that 


/ 
lim én M,(t) = lim Ee 
n—0o n—00 2//n 


and since L'(0) = EX = 0, we can use L’Hospital’s Rule once again to get 


L(t/vae _ 2 


n> co n—->co 2 2 
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using L’’(0) = var(X) = 1. Thus 

M(t) —+ exp(2/2) = M(t) 
where M(t) is the MGT of a N(0, 1) RV. 


Remark 1. In the proof above we could have used the Taylor series expansion of M to 
arrive at the same result. 


Remark 2. Even though we proved Theorem | for the case when the MGF of X,,’s exists, 
we will use the result whenever 0 < EX? = 0 < oo. The use of CFs would have provided 
a complete proof of Theorem 1. Let ¢ be the CF of X,,. Assuming again, without loss of 
generality, that EX, = 0, var(X,,) = 1, we can write 


o(t) = 1-5? +Po(1). 


Thus the CF of S,,//n is 


(o(e/vayy = [1— 2A + Soa) 


which converges to exp(—? /2) which is the CF of a N(0, 1) RV. The devil is in the details 
of the proof. 


The following converse to Theorem | holds. 


Theorem 2. Let X;,X2,...,X, be iid RVs such that n—'/ 2, has the same distribution for 
every n = 1,2,.... Then, if EX; = 0, var(X;) = 1, the distribution of X; must be N(0, 1). 


Proof. Let F be the DF of n—}/ 25: By the central limit theorem, 
lim P{n-'/25, <x} = (x). 
n—- oo 
Also, P{n~!/2S, <x} = F(x) for each n. It follows that we must have F(x) = ®(x). 


Example 3. Let X,,X2,... be iid RVs with common PMF 
P{X=k}=p(1—p)*, k=0,1,2,..., O<p<l1,g=1-p. 


Then EX = q/p, var(X) = q/p*. By Theorem 1 we see that 


p Sean) > B(x) asn— oo forall xE R. 
/ng 


Example 4. Let X,,X2,... be iid RVs with common B(a, 3) distribution. Then 


a ap 


EX = aa: and var(X) = (a+ Bat B+) 
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By the corollary to Theorem 1, it follows that 


Sn —nla/(a+ B)] L 


Jain/atBF tars] 


where Z is N(0, 1). 


For nonidentically distributed RVs we state, without proof, the following result due to 
Lindeberg. 


Theorem 3. Let X|,X2,... be independent RVs with DFs F),F>,..., respectively. Let 
EX, = pz and var(X;) = 07, and write 


n 
2 2 
= y G;. 
j=l 


If the F;’s are absolutely continuous with PDF f;, assume that the relation 


n 


| 
Tin, >| (x— px) fe(x)dx = 0 - 
noo § |x pe| >eSn 


ar 1 


holds for all ¢ > 0. (A similar condition can be stated for the discrete case.) Then 


ee! Xj — i bj 


Sn 


St= +,Z~N(0,1). (3) 
Condition (2) is known as the Lindeberg condition. 


Feller [24] has shown that condition (2) is necessary as well in the following sense. For 
independent RVs {X;} for which (3) holds and 


Pf max |X, — EX;| > evans} > 0, 
Sk 


(2) holds for every ¢ > 0. 


Example 5. Let X;,X2,... be independent RVs such that X; is U(—ax, ax). Then EX; = 0, 
var(X;) = (1/3)az. Suppose that |a,| <a and 7} az — 00 as n > 00. Then 


var (X, 
a PUK > en} < S 3 pee 


a eet 
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If yy az < cx, then s? + A, say, as n — ov. For fixed k, we can find e, such that 
eA < ay and then P{|X;| > ex5)} > P{|X;| > ¢,A} > 0. For n > k, we have 


1 ee 
ad f Pil a> EES PLM > is} 
n j=l 


> e2 PL |X;| > exsn} 
= 0; 


so that the Lindeberg condition does not hold. Indeed, if X1, X2,... are independent RVs 
such that there exists : constant A with P{|X,| <A} = 1 for all n, the Lindeberg condi- 
tion (2) is satisfied if s2 + 00 as n + oo. To see this, suppose that s? + 00. Since the X;’s 
are uniformly bounded, so are the RVs X; — EX;. It follows that for every € > 0 we can 
find an Nz such that, for n > Nz, P{|X, — EX;| < €5n,k = 1,2,...,n} = 1. The Lindeberg 
condition follows immediately. The converse also holds, for, if limy-+¢0 $2 < oo and the 
Lindeberg condition holds, there exists a constant A < oo such that s* + A”. For any fixed 
j, we can find an ¢ > 0 such that P{|X;— ju;| > eA} > 0. Then, for n > j, 


1 n 
ay of am) teax 
Sn k=1 

|x—px|>€5n 

n 
> eS 7 PLEX — be] > e5n} 

k=1 

> eP{|X;— pl > eA} 
0) 


and the Lindeberg condition does not hold. This contradiction shows that s? + oo is also 
a necessary condition that is, for a sequence of uniformly bounded independent RVs, a 
necessary and sufficient condition for the central limit theorem to hold is s? + 00 as 
n—-> oo. 


Example 6. Let X;,X>,... be independent RVs such that a, = E|X;|°+° < oo for some 
6 >Oand ay +a2 +--+ +a, = 0(s2+°). Then the Lindeberg condition is satisfied, and the 
central limit theorem holds. This result is due to Lyapunov. We have 


af Pilea 


Sn 
k=l a 


<a [bale 
— a1 1K 


“ogre >0 asn— oo. 


A similar argument applies in the discrete case. 
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Remark 3. Both the central limit theorem (CLT) and the (weak) law of large numbers 
(WLLN) hold for a large class of sequences of RVs {X,}. If the {X,} are independent 
uniformly bounded RVs, that is, if P{|X,,| <M} = 1, the WLLN (Theorem 7.3.1) holds; 
the CLT holds provided that s? — 00 (Example 5). 

If the RVs {X,} are iid, then the CLT is a stronger result than the WLLN in that the 
former provides an estimate of the probability P{|S,, —npi|/n > ¢}. Indeed, 


Sn a 
P{|S,—np| > ne} = p{ Be > “vit 
an o 
~1—P{|Z| < <vah, 
oO 
where Z is N(0,1), and the law of large number follows. On the other hand, we note that 
the WLLN does not require the existence of a second moment. 


Remark 4. If {X,,} are independent RVs, it is quite possible that the CLT may apply to the 
X,,S8, but not the WLLN. 


Example 7 (Feller [25, p. 255]). Let {X;} be independent RVs with PMF 


1 
P{X, =} = P{X, = —k} 57 KS 12 


Then EX; = 0, var(X;) = k*». Also let \ > 0, then 


n n+1 2A+1 
> r on, (a+1) 
oe =). x dk= “2-1 


It follows that, if 0 < A < 5. s,/n — 0, and by Corollary 2 to Theorem 7.3.1 the WLLN 
holds. Now k* <n’, so that the sum 77 _, ee x?)Pu Will be nonzero if n* > es, ~ 


e[n*+1/2 /,/(2\+ 1)]. It follows that, as long as n > (2+ 1)e7?, 


=> YS vipw=0 


" k=1 |xy|>esn 


and the Lindeberg condition holds. Thus the CLT holds for A > 0. This means that 
2A+1 Beta 
Pha Bats, <oh > | oe : 


ant!/2-1 Ss bnd+1/2-1 b et /2 
papel > dt 
{ V2A\+1 n V2rA4+1 \ a V2 


and the WLLN cannot hold for \ > S 


Thus 
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We conclude this section with some remarks concerning the application of the CLT. 
Let X;,X2,... be iid RVs with common mean p and variance o?. Let us write 


= Sn — Ae 
avn 


and let z), z2 be two arbitrary real numbers with z; < zp. If F,, is the DF of Z,, then 


Zn 


lim P{z <Z,<z}= lim [Fi(z2) — Fi(zi)] 
noo noo 


SS e 5 
V2T Jz 


that is, 
1 2 2 
lim P{zjoV/n+np <S,<nmoVvn+tn -—| et? di. (4) 
Jim, Pleo i+ nu < Sy <zx7Vi-+mp} = = | 


It follows that the RV S, = yi X, is asymptotically normally distributed with mean nu 
and variance no”. Equivalently, the RV n—!S,, is asymptotically N(j1,07/n). This result is 
of great importance in statistics. 

In Fig. 1 we show the distribution of X in sampling from P(A) and G(1,1). We have 
also superimposed, in each case, the graph of the corresponding normal approximation. 

How large should n be before we apply approximation (4)? Unfortunately the answer 
is not simple. Much depends on the underlying distribution, the corresponding speed of 
convergence, and the accuracy one desires. There is a vast amount of literature on the 
speed of convergence and error bounds. We will content ourselves with some examples. 
The reader is referred to Rohatgi [90] for a detailed discussion. 

In the discrete case when the underlying distribution is integer-valued, approximation 
(4) is improved by applying the continuity correction. If X is integer-valued, then for 
integers x1 ,x2 


P{x, <X <x} = P{x, —1/2 < X <x. +1/2}, 


which amounts to making the discrete space of values of X continuous by considering 
intervals of length | with midpoints at integers. 


Example 8. Let X;,X2,...,X, be iid b(1,p) RVs. Then ES, = np and var(S,,) = np(1—p) 


so (S, —np)/./np(1 —p) is approximately N(0, 1). 
Suppose n = 10, p = 1/2. Then from binomial tables P(X < 4) = 0.3770. Using normal 
approximation without continuity correction 


4—5 
P(X <4) 2P( Z< — ) =P(Z < —0.63) = 0.2643. 
(x <4) ( <5) (z < 0.63) 


Applying continuity correction, 


P(X <4) = P(X < 4.5) & P(Z < —0.32) = 0.3745. 
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(a) 
0.8 4 
0.6 
0.4 
0.2 
= > 
0 1 2 3 4 ) 6 
(b) 
A 
2 Approximation 


Exact density 


0 05 1 ie 2 


Fig. 1 (a) Distribution of X for Poisson RV with mean 3 and normal approximation and (5) distri- 
bution of X for exponential RV with mean 1 and normal approximation. 
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Next suppose that n = 100, p = 0.1. Then from binomial tables P(X = 7) = 0.0889. 
Using normal approximation, without continuity correction 


P(X =7) =P(6.0<X < 8.0) = P(—1.33 < Z < —0.67) 
= 0.1596 


and with continuity correction 


P(X =7) =P(6.5 <X <7.5) © P(-1.17 <Z < —0.83) 
= 0.0823 


The rule of thumb is to use continuity correction, and use normal approximation whenever 
np(1—p) > 10, and use Poisson approximation with \ = np for p < 0.1, \ < 10. 


Example 9. Let X,,X2,... be iid P(A) RVs. Then S,, has approximately an N(n\,n) dis- 
tribution for large n. Let n = 64, \ = 0.125. Then S,, ~ P(8) and from Poisson distribution 
tables P(S, = 10) = 0.099. Using normal approximation 


P(S, = 10) = P(9.5 < S, < 10.5) © P(0.53 < Z < 0.88) 
= 0.1087. 


If n = 96, \ = 0.125, then S, ~ P(12) and 


P(S, = 10) =0.105, exact, 
n=l 


P(S, 0) + 0.1009, normal approximation. 


PROBLEMS 7.6 


1. Let {X,,} be a sequence of independent RVs with the following distributions. In each 
case, does the Lindeberg condition hold? 


(a) P{Xn = +(1/2")} = 2: 
(b) PG = 1/2 Pie, = 0 ea 1 — (i /e), 
(c) PIX, = 21a (12) 2 PX, = ee a 2 
(d) {X,,} is a sequence of independent Poisson RVs with parameter \,,, = 1,2,..., 
such that }77_, Ax — 00. 
(c) Pix, = 22" =. 
2. Let X1,X2,... be iid RVs with mean 0, variance 1, and Ex? < oo. Find the limiting 
distribution of 


XX +X3Xq + +++ + Xn 1X2n 
XE4XB+--- +2, 


ZL, =n 


3. Let X),X>,... be iid RVs with mean a and variance o”, and let Y,,¥,... be iid 
RVs with mean 6 (¢ 0) and variance 7”. Find the limiting distribution of Z, = 
Vn(Xn — a) /Yn, where X, =n! S>y_, X; and Y, =n! 0y_, Yi. 
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4. 


10. 


11. 


12. 


. Use the CLT applied to a Poisson RV to show that limy_,.. e7” rel 


Let X ~ b(n, 0). Use the CLT to find n such that Pg {X > n/2} > 1—a. In particular, 
let a = 0.10 and 0 = 0.45. Calculate n, satisfying P{X > n/2} > 0.90. 


. Let X;,X,... be a sequence of iid RVs with common mean yi and variance o7. Also, 


let X =n! Svy_, Xe and S? = (n—1)~!S0_ | (X; — X)*. Show that //n(X — p)/ 
SZ, where Z~ N(0, 1). 


. Let X1,Xo,...,Xi09 be iid RVs with mean 75 and variance 225. Use Chebychev’s 


inequality to calculate the probability that the sample mean will not differ from the 
population mean by more than 6. Then use the CLT to calculate the same probability 
and compare your results. 


. Let X1,X2,...,X190 be iid P(A) RVs, where = 0.02. Let S = Sio9 = 30)" X;. Use 


the central limit result to evaluate P{S > 3} and compare your result to the exact 
probability of the event S > 3. 


. Let X,X2,...,Xg, be iid RVs with mean 54 and variance 225. Use Chebychev’s 


inequality to find the possible difference between the sample mean and the pop- 
ulation mean with a probability of at least 0.75. Also use the CLT to do the 
same. 

n—1 (nt) 


T= 1 for 


O0<t<1,=5ift=1,and0ifr>1. 

Let X,,X2,... be a sequence of iid RVs with mean jz and variance o”, and assume that 

EXt < oo. Write V, = ee (Xe 1)’. Find the centering and norming constants A, 

and B,, such that B>!(V, —A,) “> Z, where Z is N(0, 1). 

From an urn containing 10 identical balls numbered 0 through 9, n balls are drawn 

with replacement. 

(a) What does the law of large numbers tell you about the appearance of 0’s in the 
n drawings? 

(b) How many drawings must be made in order that, with probability at least 0.95, 
the relative frequency of the occurrence of 0’s will be between 0.09 and 0.11? 

(c) Use the CLT to find the probability that among the n numbers thus chosen 
the number 5 will appear between (n — 3,/n)/10 and (n+ 3,/n)/10 times 
(inclusive) if (i) nm = 25 and (ii) n = 100. 

Let X),Xo,...,Xn be iid RVs with EX, = 0 and EX? = 07 < 00. Let X = )77_, Xi /n, 

and for any positive real number ¢ let P,,- = P{X > e}. Show that 


o 1 


- eVn VJ2n 


2 Jn 2 
ene /20 ; 


Pre asn— oo. 


[Hint: Use (5.3.61).] 


7.7 LARGE SAMPLE THEORY 


In many applications of probability one needs the distribution of a statistic or some func- 
tion of it. The methods of Section 7.3 when applicable lead to the exact distribution of the 
statistic under consideration. If not, it may be sufficient to approximate this distribution 
provided the sample size is large enough. 
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Let {X,, } be a sequence of RVs which converges in law to N(j,07). Then {(X, —4)/o)} 
converges in law to N(0,1), and conversely. We will say alternatively and equivalently 
that {X,} is asymptotically normal with mean ju: and variance o?. Me generally, we 
say that Xn is asymptotically normal with “mean” ju, and “variance” 02, and write X,, is 
AN(fin, a2), if o, > 0 and as n —> 00. 


Xn An Un 
on 


+,N(0,1). (1) 


Here ju, is not necessarily the mean of X, and o2, not necessarily its variance. In this 
case we can approximate, for sufficiently large n, P(X, < t) by P (z < we) , where Z is 
N(O, 1). 

The most common method to show that X,, is AN(tn, 7, on) i is the central limit theorem of 


Section 6. Thus, according to Theorem 7.6.1 \/n(X, =p) N(0,07) as n > 00, where 
X,, is the sample mean of n iid RVs with mean jz and variance a”. The same result applies 
to kth sample moment, provided E|X|”* < oo. Thus 


xk 
Sox /n is AN (ext mt). 
j=l 7 
In many large sample approximations an application of the CLT along with Slutsky’s 
theorem suffices. 


Example 1. Let X,,X2,... be iid N(u,07). Consider the RV 


Vian(X = 1) 


T, = 
f 


The statistic T,, is well-known for its applications in statistics and in Section 6.5 we deter- 
mined its exact distribution. From Example 6.3.4 (n—1)S?/n — +0? and hence S/o =a i 
Since /n(X — p)/o Tra N(0, 1), it follows from Slutsky’s theorem that T,, —";Z. Thus 


for sufficiently large n (n > 30) we can approximate P(T,, < t) by P(Z < ft). 

Actually, we do not need X’s to be normally distributed (see Problem 7.6.5). 

Often we need to approximate the distribution of g(Y,,) given that Y, is AN(,u,07). 
Theorem 1 (Delta Method). Suppose Y,, is AN(ji,07), with o, > 0 and pu a fixed real 


number. Let g be a real-valued function which is differentiable at x = ju, with g’(:) 40. 
Then 


g(¥n) is AN (g(u), [g’(H)?on) - (2) 


Proof. We first show that 


(3) 
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Set 


s(x)—8(H) __ gl 
Ho)={ a EMO BEL 


0, x= pl. 


Then / is continuous at x = j4. Since 
y, — 
Yy—n=on| WE] 40 
On 


by Problem 7.2.7, Y;, — {1 > 0, and it follows from Theorem 7.2.4 that h(Y,,) —+h(j:) =0. 
By Slutsky’s theorem, therefore, 


Yn—U P 


h(Yn) — 0. 


n 


That is, 


g(Yn)—8(ut) Yn—-pm p 


+0. 
On8’ (Ks) on 


It follows again by Slutsky’s theorem that [g(Y,) — g(/+)|/[g’ (14) on] has the same limit 
law as (Y, — 1) /on- 


Example 2. We know by CLT theorem that Y,, = X is AN(,07/n). Suppose g(X) = 
X(1—X) where X is the sample mean in random sampling from a population with mean 
and variance a7. Since g/(u) = 1—2u 40 for p 4 1/2, it follows that for 4 1/2, 
a? < 00, X(1—X) is AN(u(1— 2), (1 —2)o?/n). Thus 


X(1—X)~p(l-n) © y-HU 4) ) 
|1—2plo/V/n)  ~ |L—2plo/Vn 


P(X(1—X) <y) =P/ 


2 


for large n. 


Remark 1. Suppose g in Theorem 1 is differentiable k times, k > 1, atx = wand g (1) =0 
for 1 <i<k—1,g (4) £0. Then a similar argument using Taylor’s theorem shows that 


(e(¥s) — etl { ga anon } 424 (4) 


where Z is a N(0,1) RV. Thus in Example 2, when p = 1/2, g/(1/2) = 0 and g’(1/2) = 
—2 £0. It follows that 


n[X(1 —X) — 1/4] ++ —0?42(1) 


since 2£y2(1). 
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Remark 2. Theorem | can be extended to the multivariate case but we will not pursue the 
development. We refer the reader to Ferguson [29] or Serfling [102]. 


Remark 3. In general the asymptotic variance [g’(j)|?07 of g(Y,) will depend on the 
parameter jy. In problems of inference it will often be desirable to use transformation 
g such that the approximate variance var g(Y,,) is free of the parameter. Such transforma- 
tions are called variance stabilizing transformations. Let us write 02 = o7()/n. Then 
finding a g such that var g(Y,,) is free of jz is equivalent to finding a g such that 


g (“) =c/o(p) 


for all jz, where c is a constant independent of ju. It follows that 


ea)=e f (5) 


Example 3. In Example 2, o7() = (1 — 2). Suppose X),...,X, are iid b(1,p). Then 
o?(p) = p(1 —p) and (5) reduces to 


= 2arcsin x. 


a ‘, dx 
x)=c | — 
: \/x(1—x) 
Since g(0) = 0, g(1) = 1, c = (2/n), and g(x) = (2/7) arcsin /x. 

Remark 4. In Section 6.3 we computed exact moments of some statistics in terms of pop- 


ulation parameters. Approximations for moments of g(X) can also be obtained from series 
expansions of g. Suppose g is twice differentiable at x = yw. Then 


Ee(X) © 9(u) +E(X— pe! (u) + 58 (WE)? ©) 
and 


E[g(X) — g(w))? © le’ (WP E(X — 1)’, (7) 


by dropping remainder terms. The case of most interest is to approximate Eg(X) and 
var g(X). In this case, under suitable conditions, one can show that 


Ee(X) ~ a(u) += 9"(u) (8) 
and 
YY o / 2 
var g(X) = ——[e'(u)]", (9) 


where EX = pz and var(X) = 07. 
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In Example 2, when X;’s are iid b(1,p), and g(x) =x(1—x), g’(x) = 1—2x, g(x) = —2 


so that 
Be(®) ~ ER -¥)] ~ p(t —p) + (2) 
=p(1—p)"— 
and 
var g(®) = PUP) op) 


In this case we can compute Eg(X) and var g(X) exactly. We have 


n—-1 


= = —2 1- 
Eg(X) = EX —EX =p-— (ae +p) = p(1—p) 
so that (8) is exact. Also since x = X;, using Theorem 6.3.4 we have 


varg(X) = var(X—X) 
= varX —2cov(X,X ) + EX — (EX)? 


=P?) {a op)?4 es (ty 


Thus the error in approximation (9) is 


2p? (1 — p)? 


E — —1). 
rror a (n—1) 


Remark 5. Approximations (6) through (9) do not assert the existence of Eg(X) or Eg(X), 


or var g(X) or var g(X). 


Remark 6. It is possible to extend (6) through (9) to two (or more) variables by using 


Taylor series expansion in two (or more) variables. 


Finally, we state the following result which gives the asymptotic distribution of the rth 
order statistic, 1 <r <n, in sampling from a population with an absolutely continuous DF 


F with PDF f. For a proof see Problem 4. 


Theorem 2. If X(,) denotes the rth order statistic of a sample X),X2,...,X, from an 


absolutely continuous DF F with PDF f, then 


1/2 


(10) 


so that r/n remains fixed, r/n = p, where Z is N(0, 1), and 3, is the unique solution of 


F (3p) =p (that is, 3, is the population quantile of order p assumed unique). 
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Remark 7. The sample quantile of order p, Z,, is 


Ning), 


where 3, is the corresponding population quantile, and f is the PDF of the population 


distribution function. It also follows that Z, ae 3p- 


PROBLEMS 7.7 


1. In sampling from a distribution with mean ju and variance 7 find the asymptotic 
distribution of 
(aX, (17% © aX 
both when ys 4 0 and when pz = 0. 
2. Let X ~ P(A). Then (X — d)/VA— N(0, 1). Find a transformation g such that 
(g(X) — g(A)) has an asymptotic N(0, c) distribution for large 1 where c is a suitable 
constant. 
3. Let X|,X2,...,X, be a sample from an absolutely continuous DF F with PDF f. 
Show that 


2 (d) exp(X) 


; 
EX,,) & F7! d 
(r) (4) an 


r(n—r+l1) 1 
(n+ 1)?(n+2) {f[F"(r/n+ 1)]}? 


[Hint: Let Y be an RV with mean ju and ¢ be a Borel function such that E¢(Y) exists. 
Expand ¢(Y) about the point jz by a Taylor series expansion, and use the fact that 
F(X) = Ue] 

4. Prove Theorem 2. [Hint: For any real 4 and o (> 0) compute the PDF of 
(U() — #)/o and show that the standardized U(,), (U(,) — 4) /a, is asymptotically 
N(0, 1) under the conditions of the theorem.] 

5. Let X ~ x7(n). Then (X —1n)/V/2n is AN(0,1) and X/n is AN (1,2). Find a 
transformation g such that the distribution of g(X) — g(n) is AN(0,c). 

6. Suppose X is G(1,@). Find g such that g(X) — g(0) is AN(0,c). 

7. Let X;,X,...,Xy be iid RVs with E|X,|* < 00. Let var(X) = 0? and 82 = p4/o*: 
(a) Show, using the CLT for iid RVs, that \/n(S? — 07) = N(0, 4 — 0%). 

(b) Find a transformation g such that g(S*) has an asymptotic distribution which 


depends on {3 alone but not on 0”. 


var (X(;)) & 


PARAMETRIC POINT ESTIMATION 


8.1 INTRODUCTION 


In this chapter we study the theory of point estimation. Suppose, for example, that a ran- 
dom variable X is known to have a normal distribution N(ju,07), but we do not know one of 
the parameters, say j1. Suppose further that a sample X1, X2,...,X, is taken on X. The prob- 
lem of point estimation is to pick a (one-dimensional) statistic T(X),X2,...,X,) that best 
estimates the parameter jz. The numerical value of T when the realization is x1,x2,...,Xn 
is frequently called an estimate of yu, while the statistic T is called an estimator of ju. If 
both jz and o” are unknown, we seek a joint statistic T = (U, V) as an estimator of (1,07). 

In Section 8.2 we formally describe the problem of parametric point estimation. Since 
the class of all estimators in most problems is too large it is not possible to find the “best” 
estimator in this class. One narrows the search somewhat by requiring that the estimators 
have some specified desirable properties. We describe some of these and also outline some 
criteria for comparing estimators. 

Section 8.3 deals, in detail, with some important properties of statistics such as suffi- 
ciency, completeness, and ancillarity. We use these properties in later sections to facilitate 
our search for optimal estimators. Sufficiency, completeness, and ancillarity also have 
applications in other branches of statistical inference such as testing of hypotheses and 
nonparametric theory. 

In Section 8.4 we investigate the criterion of unbiased estimation and study methods for 
obtaining optimal estimators in the class of unbiased estimators. In Section 8.5 we derive 
two lower bounds for variance of an unbiased estimator. These bounds can sometimes help 
in obtaining the “best” unbiased estimator. 


An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 


338 PARAMETRIC POINT ESTIMATION 


In Section 8.6 we describe one of the oldest methods of estimation and in Section 8.7 
we study the method of maximum likelihood estimation and its large sample properties. 
Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 deals with 
equivariant estimation. 


8.2) PROBLEM OF POINT ESTIMATION 


Let X be an RV defined on a probability space (0,5, P). Suppose that the DF F of X 
depends on a certain number of parameters, and suppose further that the functional form of 
F is known except perhaps for a finite number of these parameters. Let 0 = (0), 02,..., 9%) 
be the unknown parameter associated with F. 


Definition 1. The set of all admissible values of the parameters of a DF F is called the 
parameter space. 


Let X = (X1,X2,...,X,) be an RV with DF Fo, where 6 = (6),62,...,0,) is a vector 
of unknown parameters, 8 € ©. Let w be a real-valued function on O. In this chapter we 
investigate the problem of approximating 7)(@) on the basis of the observed value x of X. 


Definition 2. Let X = (X1,X2,...,Xn) ~ Pe, 9 € O. A statistic 6(X) is said to be a (point) 
estimator of w if 6 : 2 — ©, where X is the space of values of X. 

The problem of point estimation is to find an estimator 6 for the unknown parametric 
function ~(@) that has some nice properties. The value 6(x) of 6(X) for the data x is 
called the estimate of (0). 

In most problems Xj, X2,...,X, are iid RVs with common DF Fg. 


Example 1. Let X\,X2,...,X, be iid G(1,@), where 0 = {0 > 0} and 0 is to be estimated. 
Then X = R* and any map 6 : X — (0,00) is an estimator of 9. Some typical estimators 
of @ are X =n!" _, X; and {2/[n(n + 1)]} Ot X}. 


Example 2. Let X,X2,...,Xn be iid b(1,p) RVs, where p € [0, 1]. Then X is an estimator 
of p and so also are 5)(X) = X1, 62(X) = (Xi + X,)/2, and 53(X) = D%_, ajX;, where 
O<a <1, gal. 


It is clear that in any given problem of estimation we may have a large, often an infinite, 
class of appropriate estimators to choose from. Clearly we would like the estimator 6 to 
be close to y(@), and since 6 is a statistic, the usual measure of closeness |5(X) — w(@)| 
is also an RV, we interpret “d close to w” to mean “close on the average.” Examples of 
such measures of closeness are 


Po{|d(X) — p(8)| <e} (1) 
for some € > 0, and 


Eo|d(X) — (8)! (2) 
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for some r > 0. Obviously we want (1) to be large whereas (2) to be small. For r = 2, the 
quantity defined in (2) is called mean square error and we denote it by 


MSE¢(5) = Eo{5(X) —v(6)}°. (3) 
Among all estimators for w we would like to choose one say do such that 
Po{|5o(X) — (8)| < e} = Po{|5(X) — ¥(8)| < e} (4) 
for all 6, all © > 0 and all 0. In case of (2) the requirement is to choose dg such that 
MSE (5) < MSEo(5) (5) 


for all 6, and all 8 € ©. Estimators satisfying (4) or (5) do not generally exist. 
We note that 


MSE (65) = Eo {5(X) — E9d(X)}° + {E9d(X) —v(0)}? 
= varg 6(X) + {b(5,v)}’, (6) 


where 
b(6,) = E9d(X) — (8) (7) 


is called the bias of 6. An estimator that has small MSE has small bias and variance. In 
order to control MSE, we need to control both variance and bias. 
One approach is to restrict attention to estimators which have zero bias, that is, 


Eed(X)=v(8) forallO co. (8) 


The condition of unbiasedness (8) ensures that, on the average the estimator 6 has no sys- 
tematic error; it neither over-nor underestimates 7 on the average. If we restrict attention 
only to the class of unbiased estimators then we need to find an estimator do in this class 
such that do has the least variance for all @ € ©. The theory of unbiased estimation is 
developed in Section 8.4. 

Another approach is to replace |6 — |" in (2) by a more general function. Let L(0,6) 
measure the loss in estimating 7) by 6. Assume that L, the /oss function, satisfies L(@,5) > 0 
for all 8 and 6, and L(@,~(@)) = 0 for all 8. Measure average loss by the risk function 


R(0,0) = EoL(0,06(X)). (9) 
Instead of seeking an estimator which minimizes R the risk uniformly in #, we minimize 
/ R(6,6)n(0) d0 (10) 
for some weight function 7 on © and minimize 
sup R(6, 6). (11) 
aco 


The estimator that minimizes the average risk defined in (10) leads to the Bayes estimator 
and the estimator that minimizes (11) leads to the minimax estimator. Bayes and minimax 
estimation are discussed in Section 8.8. 
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Sometimes there are symmetries in the problem which may be used to restrict attention 
only to estimators which also exhibit the same symmetry. Consider, for example, an exper- 
iment in which the length of life of a light bulb is measured. Then an estimator obtained 
from the measurements expressed in hours and minutes must agree with an estimator 
obtained from the measurements expressed in minutes. If X represents measurements in 
original units (hours) and Y represents corresponding measurements in transformed units 
(minutes) then Y = cX (here c = 60). If 6(X) is an estimator of the true mean, then we 
would expect d(Y), the estimator of the true mean to correspond to 6(X) according to the 
relation 6(Y) = co(X). That is, 6(cX) = cd(X), for all c > 0. This is an example of an 
equivariant estimator which is the topic under extensive discussion in Section 8.9. 

Finally, we consider some large sample properties of estimators. As the sample size 
n —> oo, the data x are practically the whole population, and we should expect 6(X) to 
approach w(@) in some sense. For example, if 5(X) = X, 7)(0) = EoX, and X1,X2,...,Xn 
are iid RVs with finite mean then strong law of large numbers tells us that X + EX, with 
probability 1. This property of a sequence of estimators is called consistency. 


Definition 3. Let X|,X2,... be a sequence of tid RVs with common DF Fg, 8c 0. A 
sequence of point estimators T;,(X1,X2,...,Xn) = T, will be called consistent for (0) if 


Th, *. ¥(0) as n —> 00 
for each fixed 8 € O. 


Remark I. Recall that T;, “> w() if and only if P{|7, —~(@)| > ¢} > 0 as n > oo for 
every € > 0. One can similarly define strong consistency of a sequence of estimators T,, if 
T, “+ (0). Sometimes one speaks of consistency in the rth mean when T, -> ~(@). 
In what follows, “consistency” will mean weak consistency of T, for w(@), that is, 


T, —> (8). 


It is important to remember that consistency is a large sample property. Moreover, we 
speak of consistency of a sequence of estimators rather than one point estimator. 


Example 3. Let X,,X2,... be iid b(1,p) RVs. Then EX; = p, and it follows by the 
WLLN that 


wiki 


n 


P 


Thus X is consistent for p. Also ()“} X; + 1)/(n+2) a p, so that a consistent estimator 


need not be unique. Indeed, if T,, Z p, and c, — 0 as n + oo, then JT, +c, ca p and if 
d, — 1 then d,T, > p. 


Theorem 1. If X|,X2... are iid RVs with common law £(X), and E|X|? < co for some 
positive integer p, then 


ix 


n 


*EX' — forl<k<p, 
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and n~! yin is consistent for EX*, 1 < k < p. Moreover, if c, is any sequence of con- 
stants such that c, — 0 as n — oo, then {n~!5~*X* + c,} is also consistent for EX*, 
1<k <p. Also, if c, — 1 as n > ov, then Loan! poe cal is consistent for EX*. This 
is simply a restatement of the WLLN for iid RVs. 


Example 4. Let X;,X,... be iid N(ju,07) RVs. If S is the sample variance, we know that 
(n—1)S?/o? ~ x?(n—1). Thus E(S?/o7) = 1 and var(S?/o?) = 2/(n—1). It follows that 
var (S") 20+ 


P{\S—o?|>eh< 2 oe +O as nN —> Oo. 


Thus S? +s o?. Actually, this result holds for any sequence of iid RVs with E|X|? < oo and 
can be obtained from Theorem 1. 


Example 4 is a particular case of the following theorem. 


Theorem 2. If 7;, is a sequence of estimators such that ET, + w(@) and var(T,,) > 0 as 
n—> oo, then T,, is consistent for w(@). 


Proof. We have 


P{T, —(0)| > €} < e-7E{ Ty — ETy + ET, — ¥(0)}" 
= e~*{var(T,) + (ET, — ¥(0))*} > 0 as N — Oo. 


Other large sample of properties of estimators are asymptotic unbiasedness, asymptotic 
normality, and asymptotic efficiency. A sequence of estimators {T,,} is asymptotically 
unbiased for 1)(@) if 


lim EoT,(X) = (6) 


for all 8. A consistent sequence of estimators {T;,,} is said to be consistent asymptotically 
normal (CAN) for ~)(@) if T, ~ AN(w(@), v(@)/n) for all 8 € O. If v(@) = 1/7(@), where 
1(0) is the Fisher information (Section 8.7), then {T7,,} is known as a best asymptotically 
normal (BAN) estimator. 


Example 5. Let X\,X2,...,Xn be iid N(0, 1) RVs. Then 7, = 7, Xi/(n +1) is asymp- 
totically unbiased for 9 and BAN estimator for @ with v(@) = 1. 


In Section 8.7 we consider large sample properties of maximum likelihood estimators 
and in Section 8.5 asymptotic efficiency is introduced. 


PROBLEMS 8.2 


1. Suppose that 7;, is a sequence of estimators for parameter @ that satisfies the condi- 
tions of Theorem 2. Then 7, a @, that is, T,, is squared error consistent for 0. If T,, 
is consistent for 0 and |T,, — 0| < A < oo for all 6 and all (x1,x2,...,%) € Ry, show 
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that T,, 26, If, however, |T;, — 9| < Ay < oo, then show that T,, may not be squared 
error consistent for 0. 


2. Let X1,X2,...,X, be a sample from U(0,6],@ € © = (0,00). Let X(,) = max 
{X1,X2,...,Xn}. Show that X(,) *. 6. Write Y,, = 2X. Is Y, consistent for 0? 

3. Let X),X2,...,X, be iid RVs with EX; = ps and E|X;|* < oo. Show that 7(X, 
Xo,...,Xn) = 2[n(n + 1)]~' S0y_, iX; is a consistent estimator for pu. 

4. Let X1,X2,...,X, be a sample from U[0,6]. Show that 7T(X),X,...,X,) = 
([[/_, X:)'/” is a consistent estimator for 0e~!. 

5. In Problem 2 show that T(X) = X(,) is asymptotically biased for @ and is not BAN. 
(Show that n(0—X(n)) = G(1,0).) 

6. In Problem 5 consider the class of estimators T(X) = eX(,), ¢ > 0. Show that the 
estimator Ty(X) = (n+ 2)X(,)/(n +1) in this class has the least MSE. 

7. Let X1,X2,...,X, be iid with PDF fg (x) = exp{—(x—0)}, x > @. Consider the class 
of estimators T(X) = X(1) +b, b € R. Show that the estimator that has the smallest 
MSE in this class is given by T(X) = X(1) — 1/n. 


8.3. SUFFICIENCY, COMPLETENESS AND ANCILLARITY 


After the completion of any experiment, the job of a statistician is to interpret the data she 
has collected and to draw some statistically valid conclusions about the population under 
investigation. The raw data by themselves, besides being costly to store, are not suitable 
for this purpose. Therefore the statistician would like to condense the data by computing 
some statistics from them and to base her analysis on these statistics, provided that there is 
“no loss of information” in doing so. In many problems of statistical inference a function 
of the observations contains as much information about the unknown parameter as do all 
the observed values. The following example illustrates this point. 


Example 1. Let X,,X2,...,X be a sample from N(,1), where y is unknown. Suppose 
that we transform variables X1,X2,...,X;, to Yj, Y2,...,Y, with the help of an orthogo- 
nal transformation so that Y; is N(\/np,1), Yo,...,Yn are iid N(0,1), and Y,,Y2,...,Yn 
are independent. (Take y, = \/nx, and, for k = 2,...,n, ye = [(kK—1)x, — (4) +--° + 
Xn—1)]/.V/k(k— 1)). To estimate yz we can use either the observed values of X),X2,...,Xn 
or simply the observed value of Y; = VnXx . The RVs Y>, Y3,..., Y, provide no information 
about ju. Clearly, Y; is preferable since one need not keep a record of all the observations; 
it suffices to cumulate the observations and compute y;. Any analysis of the data based 
on yj, is just as effective as any analysis that could be based on x;’s. We note that Y; takes 
values in R, whereas (X1,X2,...,X,) takes values in R,. 


A rigorous definition of the concept involved in the above discussion requires the notion 
of a conditional distribution and is beyond the scope of this book. In view of the discussion 
of conditional probability distributions in Section 4.2, the following definition will suffice 
for our purposes. 


Definition 1. Let X = (X),X2,...,X,) be a sample from {Fg: 0 € O}. A statistic T = 
T(X) is sufficient for @ or for the family of distributions {F,: 6 € O} if and only if the 
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conditional distribution of X, given T = t, does not depend on @ (except perhaps for a null 
set A, Po {T € A} =0 for all 6). 


Remark I. The outcome X1,X2,...,X;, is always sufficient, but we will exclude this trivial 
statistic from consideration. According to Definition 1, if T is sufficient for 6, we need 
only concentrate on T since it exhausts all the information that the sample has about 6. 
In practice, there will be several sufficient statistics for a family of distributions, and the 
question arises as to which of these should be used in a given problem. We will return to 
this topic in more detail later in this section. 


Example 2. We show that the statistic Y, in Example | is sufficient for js. By construction 
Y2,..., Y, are iid N(0, 1) RVs that are independent of Y,. Hence the conditional distribution 
of Y,...,¥,, given Yj = J/nx, is the same as the unconditional distribution of (Y2,..., Yn), 
which is multivariate normal with mean (0,0,...,0) and dispersion matrix I,,_. Since this 
distribution is independent of ju, the conditional distribution of (Y,, Y2,...,Y,,), and hence 
(X1,Xo,...,Xn), given Y; = yj, is also independent of ju and Y, is sufficient. 


Example 3. Let X,,X2,...,X, be iid b(1,p) RVs. Intuitively, if a loaded coin is tossed 
with probability p of heads n times, it seems unnecessary to know which toss resulted in 
a head. To estimate p, it should be sufficient to know the number of heads in 7 trials. We 
show that this is consistent with our definition. Let T(X1,X2,...,Xn) = >\y_, Xi. Then 


se PG Sp See =F) 
| = ? 
a (")p'(d —p)r-t 


if )>) x; = ¢, and = 0 otherwise. Thus, for 5“) x; = t, we have 


Pix =X1,---,Xy = Xn 


peis(1— pyr Es 


Cril-py 
1 


= ay 


which is independent of p. It is therefore sufficient to concentrate on )*} X;. 


P{X, =%1,...,Xn =x,|T=th= 


Example 4. Let X,,X> be iid P(A) RVs. Then X, + X) is sufficient for \, for 


P{X, =x,,X> =X2 |X, +X) = t} 
P{X, =x1,X. =t—x} 
= P{X, +X = t} 
0 otherwise. 


if t =x, +x2,x;=0,1,2,..., 


Thus, for x; = 0,1,2,...,i= 1,2,x; +x2 =t, we have 


t 1\‘ 
P{X = x,,X_ =x |X, +X, =t}= a}> 


xX] 
which is independent of A. 


Not every statistic is sufficient. 
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Example 5. Let X,,X> be iid P(A) RVs, and consider the statistic T = X, + 2X. We have 
P{X, = 0,X2 = 1} 

P{X, +2X = 2} 

_ e~*(Ne7>) 

~ P{X; =0,X2 = 1} + P{X, = 2,X = 0} 

7 de? _ 1 

~ NeW2A 4 (N2/2)e-24_ 1.4. (0/2)’ 


and we see that X, + 2X) is not sufficient for X. 


P{X, = 0,X. = 1|X,+2X, =2} = 


Definition | is not a constructive definition since it requires that we first guess a statistic 
T and then check to see whether T is sufficient. Moreover, the procedure for checking that 
T is sufficient is quite time-consuming. We now give a criterion for determining sufficient 
statistics. 


Theorem 1 (The Factorization Criterion). Let X,X2,...,X, be discrete RVs with PMF 
Po(X1,%2,---,Xn), 9 € O. Then T(X),X2,...,X,) is sufficient for 6 if and only if we can 
write 


Po(X1,%2, iiss Xn) = h(x yXQ,-5- ;%n) go(T(x yX2,++- iXn))s (1) 
where / is a nonnegative function of x;,x2,...,X, only and does not depend on 6, and 
gg is a nonnegative nonconstant function of @ and T(x),%2,...,X»,) only. The statistic 
T(X\,...,X,) and parameter @ may be multidimensional. 


Proof. Let T be sufficient for 6. Then P{X = x | T = t} is independent of 0, and we 
may write 
Po{X = x} = Po{X =X, T(X,X2, oe Xn) = t} 
= Po{T=t} P{X =x|T=+4}, 
provided that P{X = x | T = ¢} is well defined. 


For values of x for which Pp {X = x} = 0 for all 6, let us define A(x), x2,...,%») =0, 
and for x for which Pg {X = x} > 0 for some 0, we define 


A(x1,X2,---,Xn) = P{X, =x1,...,Xn =Xy | T = th 
and 
go(T(x1,%2,---,Xn)) = Po{T(1,...,Xn) = th. 
Thus we see that (1) holds. 
Conversely, suppose that (1) holds. Then for fixed t9 we have 
Po{T=m}= SY, Po{X=x} 
(x: T(x)=t0) 


= S¢  go(T(x))A(x) 


(x: T(x)=t) 


=go(to.) D> h(x). 


T(x) =to 
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Suppose that Pg{T = to} > 0 for some 0 > 0. Then 


Po{X=x,7(x) =} _ J? if T(x) # to, 


PAT) =} Ms if T(x) = to. 


Po{X =x|T=H}= 


Thus, if T(x) = fo, then 
Po{X=x} _ _golto)h(x) 
Po{T(x) =to} — 80(t0) orcx)=r A(X)’ 


which is free of 0, as asserted. This completes the proof. 


Remark 2. Theorem | holds also for the continuous case and, indeed, for quite arbitrary 
families of distributions. The general proof is beyond the scope of this book, and we refer 
the reader to Halmos and Savage [41] or to Lehmann [64, pp. 53-56]. We will assume that 
the result holds for the absolutely continuous case. We leave the reader to write the analog 
of (1) and to prove it, at least under the regularity conditions assumed in Theorem 4.4.2. 


Remark 3. Theorem | (and its analog for the continuous case) holds if # is a vector of 
parameters and T is a multiple RV, and we say that T is jointly sufficient for 6. We empha- 
size that, even if 6 is scalar, T may be multidimensional (Example 9). If @ and T are of 
the same dimension, and if T is sufficient for 0, it does not follow that the jth component 
of T is sufficient for the jth component of 6 (Example 8). The converse is true under mild 
conditions (see Fraser [32, p. 21]). 


Remark 4. If T is sufficient for 9, any one-to-one function of T is also sufficient. This 
follows from Theorem 1, if U = k(T) is a one-to-one function of 7, then t = k~'(u) and 
we can write 


fo(X) = 8o(t)h(x) = go(k-'(u))h(x) = 86 (w)h(x). 


If 7, ,7> are two distinct sufficient statistics, then 


fo(X) = go(ti)hi(X) = go (t2)h2(x), 


and it follows that 7; is a function of 7. It does not follow, however, that every function of 
a sufficient statistic is itself sufficient. For example, in sampling from a normal population, 
—_ : =2. >: : 

X is sufficient for the mean jz but X- is not. Note that X is sufficient for ju. 


Remark 5. As a rule, Theorem | cannot be used to show that a given statistic T is not 
sufficient. To do this, one would normally have to use the definition of sufficiency. In 
most cases Theorem 1 will lead to a sufficient statistic if it exists. 


Remark 6. If T(X) is sufficient for {F9: 6 € O}, then T is sufficient for {F9: 6 € w}, 
where w C O. This follows trivially from the definition. 
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Example 6. Let X;,X>,...,X, be iid b(1,p) RVs. Then T = )>y_, X; is sufficient. We have 


Py {Xi = x1,X2 =X2,...,Xn =X} =p ed p)" 1 


and, taking 


p esi 
h(x1,X2,---;Xn) = 1 and spl oto) = (1 =p)" ( ) ? 


we see that T is sufficient. We note that T;(X) = (X1,X2 +X3+4+---+X,) and T)(X) = 
(X1 + Xo,X3,X4+Xs5+---+X,,) are also sufficient for p although T is preferable to T| 
or T. 


Example 7. Let X,X2,...,X;, be iid RVs with common PMF 


P{Xj=kh=—,  k=1,2,...,N; i=1,2,...,n 
N 
Then 
1 
Pyy{X1 = ki, X2 = kay.) Xn = kn} = ay 1a ie ct oO 
1 . 
= pa Pll, {min #)e( max kiN), 


where y(a,b) = 1 if b> a, and = O if b <a. It follows, by taking gy[max(k,,...k,)] = 
(1/N")p(max)<i<nki,N) and h = y(1,mink;), that max(X,,X2,...,X,) is sufficient for 
the family of joint PMFs Py. 


Example 8. Let X,,X2,...,X, be a sample from N(j:,07), where both ys and o? are 
unknown. The joint PDF of (X1,X2,...,Xn) is 


1 
o2\X) = ex 
Su, ( ) (oV2n)" p{ Ic2 
te ete us) 


207 o 20? 


It follows that the statistic 


Taw = (Sx) 


is jointly sufficient for the parameter (j.,07). An equivalent sufficient statistic that is 
frequently used is 7)(X1,...,Xn) = (X,S*). Note that X is not sufficient for ys if o7 
is unknown, and S? is not sufficient for o? if jt is unknown. If, however, o? is known, 
X is sufficient for ju. If 4 = fo is known, 3~}(X;— suo)? is sufficient for 07. 
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Example 9. Let X,X2,...,X, be a sample from PDF 


1 00 
—, re|-55]. d>0, 
0, otherwise. 
The joint PDF of X1,X2,...,X, 1s given by 
fo(%1,%2,---,Xn) = gatas ++s%n)s 


where 


0 0 
A= {Gsitas st) Sa < minx; < maxx; < st. 


It follows that (X(1),X(n)) is sufficient for 0. 

We note that the order statistic (X(1),X2),-.-,X(n)) is also sufficient. Note also that the 
parameter is one-dimensional, the statistics (X(1),X (ny) is two-dimensional, whereas the 
order statistic is n-dimensional. 


In Example 9 we saw that order statistic is sufficient. This is not a mere coincidence. 
In fact, if X = (X),X2,...,X,) are exchangeable then the joint PDF of X is a symmetric 
function of its arguments. Thus 


fo(x1 yXQ,++- ia) = fo(x(1),X(2)5 a slay )s 
and it follows that the order statistic is sufficient for fo. 


The concept of sufficiency is frequently used with another concept, called complete- 
ness, which we now define. 


Definition 2. Let {fo(x),0 € O} be a family of PDFs (or PMFs). We say that this family 
is complete if 


Eog(X) =0 for all? € O, 
which implies 
Po{g(X) =O} =1 for all 9 € O. 


Definition 3. A statistic T(X) is said to be complete if the family of distributions of T is 
complete. 


In Definition 3 X will usually be a multiple RV. The family of distributions of T is 
obtained from the family of distributions of X1,X2,...,X, by the usual transformation 
technique discussed in Section 4.4. 
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Example 10. Let X,,X2,...,X,, be iid b(1,p) RVs. Then T = )*" X; is a sufficient statistic. 
We show that T is also complete, that is, the family of distributions of T, {b(n,p),0 < 
p<}, is complete. 


Epa(t)=Soa(0)(Jp1—py"=0 forall p€ (0,1) 
t=0 


may be rewritten as 
n t 
n n P — 
(1—p) > «(9(7) (4) =0 forall p € (0,1). 
This is a polynomial in p/(1—p). Hence the coefficients must vanish, and it follows that 


g(t) =0 for t = 0,1,2,...,n, as required. 


Example 11. Let X be N(0,6). Then the family of PDFs {N(0,6),@ > 0} is not complete 
since EX = 0 and g(x) =x is not identically 0. Note that T(X) = X” is complete, for the 
PDF of X* ~ 6x7(1) is given by 


e7t/28 
» £0, 
f(t) = 4 V2n6t 
0, otherwise. 


1 co 
Eog(T) = aa g(t) /e-/*9 dr =0 for all@ >0, 


which holds if and only if i g(t)t—!/2e—*/?8 dt = 0, and using the uniqueness property 
of Laplace transforms, it follows that 


g(t)t'/7=0 forall >0, 


that is, g(t) =0. 
The next example illustrates the existence of a sufficient statistic which is not complete. 


Example 12. Let X;,X2,...,Xn be a sample from N(6,07). Then T = (7) X;, 52} X?) is 
sufficient for 6. However, T is not complete since 


n 2 n 
Eo 2(3>x) —(n+1)$°X?>=0 forall, 
a 1 


and the function g(x1,...,%n) =2(30} xi)” — (n+ 1) 30} 47 is not identically 0. 


Example 13. Let X ~ U(0,0), 8 € (0,00). We show that the family of PDFs of X is 
complete. We need to show that 


6 
1 
Eog(X) -{ gala) dx = 0 for all 6 > 0 
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if and only if g(x) = 0 for all x. In general, this result follows from Lebesgue integration 
theory. If g is continuous, we differentiate both sides in 


6 
i g(x) dx =0 
0 
to get g(0) = 0 for all 0 > 0. 
Now let X1,X2,.-.,Xn be iid U(0,0) RVs. Then the PDF of X(,) is given by 


no"! O<x <8, 


0, otherwise. 


fl ®)=| 


We see by a similar argument that X(,,) is complete, which is the same as saying that 
{fn(x | 0); @ > O} is a complete family of densities. Clearly, X(,) is sufficient. 


Example 14, Let X,,X>,...,X, be a sample from PMF 


i HS 12 oi NV, 
Py(x) = N 


0, otherwise. 


We first show that the family of PMFs {Py,N > 1} is complete. We have 


1 
Eyg(X) = west) =0  forallN > 1, 


and this happens if and only if g(k) = 0, k = 1,2,...,N. Next we consider the family of 
PMFs of X(n) = max(X,...,X,). The PMF of X(n) 1s given by 


Also 


kn ( k—1 Va 
N® Nt 


|=0 forall N > 1. 


implies g(1) = 0. Again, 


Eg(X(n)) = a + g(2) (: = =) mei) 


so that g(2) = 0. 

Using an induction argument, we conclude that g(1) = g(2) =--- = g(N) = 0 and 
hence g(x) = 0. It follows that pw) is a complete family of distributions, and X(,) is a 
complete sufficient statistic. 
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Now suppose that we exclude the value N = no for some fixed np > | from the family 
{Py :N> 1}. Let us write P = {Py :N > 1, N 4 no}. Then P is not complete. We ask the 
reader to show that the class of all functions g such that Epg(X) = 0 for all P € P consists 
of functions of the form 


0, k=1,2,...,m9 —1,no+2,m9 +3,..., 
g(k) = c, k=n0, 
—c, k=not+l, 


where c is a constant, c 4 0. 

Remark 7. Completeness is a property of a family of distributions. In Remark 6 we saw 
that if a statistic is sufficient for a class of distributions it is sufficient for any subclass of 
those distributions. Completeness works in the opposite direction. Example 14 shows that 


the exclusion of even one member from the family {Py : N > 1} destroys completeness. 


The following result covers a large class of probability distributions for which a 
complete sufficient statistic exists. 


Theorem 2. Let {f: 8 € ©} be a k-parameter exponential family given by 


k 
false) = exp} )"0)(8)T)(x) + D() +S(~) ¢ Q) 


where 0 = (6),62,...,0%) € ©, an interval in Ry,T),T2,...,T,%, and S are defined on 
Ra, T = Verire : Pyle), and x = (hiya 7 hn) k gn. Let Q _ (Q1,Q>,. i Qe) and 
suppose that the range of Q contains an open set in R,. Then 

T= (Ti (X), T,(X), ae, T,(X)) 
is a complete sufficient statistic. 
Proof. For a complete proof in a general setting we refer the reader to Lehmann [64, 
pp. 142-143]. Essentially, the unicity of the Laplace transform is used on the probability 
distribution induced by 'T. We will content ourselves here by proving the result for the 


k = | case when fg is a PMF. 
Let us write Q(0) = 0 in (2), and let (a, 3) C ©. We wish to show that 


Eog(T(X)) = > a(t) Po{T(X) = 1} 


= 5° a(t)exp{6r+D(4)+S*()}=0 for alld (3) 


implies that g(t) = 0. 
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Let us write xt =x ifx>0,=Oifx<0,andx- =—xifx<0,=0ifx>0. Then 
g(t) = g*(t) — g(t), and both g* and g~ are nonnegative functions. In terms of g* and 
g_, (3) is the same as 


ye (err a Sig (thet (t) (4) 
t 


t 


for all @. 
Let 0 € (a, 8) be fixed, and write 
pt()= eee and p(t) =< soe’ 
Set (Nemo Te HemrO 


(5) 


Then both p* and p~ are PMFs, and it follows from (4) that 


Sv etpt(t) = Sep (1) (6) 


t 


for all 6 € (a— 6,3 — 4). By the uniqueness of MGFs (6) implies that 
p'(t)=p (t) for all t 


and hence that g* (t) = g(t) for all ¢, which is equivalent to g(t) = 0 for all . Since T is 
clearly sufficient (by the factorization criterion), it is proved that T is a complete sufficient 
statistic. 


Example 15. Let X,,X2,...,Xn be iid N(j1,07) RVs where both jz and o? are unknown. 
We know that the family of distributions of X = (X),...,X,,) is a two-parameter exponen- 
tial family with T(X1,...,Xn) = (0) Xi, 0) X?). From Theorem 2 it follows that T is a 
complete sufficient statistic. Examples 10 and 11 fall in the domain of Theorem 2. 


In Example 6, 8, and 9 we have shown that a given family of probability dis- 
tributions that admits a nontrivial sufficient statistic usually admits several sufficient 
statistics. Clearly we would like to be able to choose the sufficient statistic that results 
in the greatest reduction of data collection. We next study the notion of a mini- 
mal sufficient statistic. For this purpose it is convenient to introduce the notion of 
a sufficient partition. The reader will recall that a partition of a space X is just a 
collection of disjoint sets EZ, such that 5°, Ey, = X. Any statistic T(X),X2,...,Xn) 
induces a partition of the space of values of (X),X2,...,X,), that is, T induces 
a covering of X by a family & of disjoint sets A, = {(1,%2,...,%,) © ¥: T(x, 
X2,.--;Xn) = t}, where t belongs to the range of T. The sets A, are called partition sets. 
Conversely, given a partition, any assignment of a number to each set so that no two par- 
tition sets have the same number assigned defines a statistic. Clearly this function is not, 
in general, unique. 


Definition 4. Let {Fy: 6 € O} be a family of DFs, and X = (X),X,...,X,) be a 
sample from Fy». Let { be a partition of the sample space induced by a statistic 


352 PARAMETRIC POINT ESTIMATION 


T = T(X,,X,...,X,). We say that L = {A, : tis in the range of T} is a sufficient parti- 
tion for 6 (or the family {F9: 6 € O}) if the conditional distribution of X, given T = ¢, 
does not depend on @ for any A;, provided that the conditional probability is well defined. 


Example 16. Let X,,X2,...,X, be iid b(1,p) RVs. The sample space of values of (X1, 
Xo,...,X,) is the set of n-tuples (x),x2,...,X,), where each x; = 0 or = | and consists of 
2" points. Let T(X),X2,...,Xn) = )°) Xi, and consider the partition = {Ao,Aj,...,An}, 
where x € A; if and only if }>) x; =j, 0 <j <n. Each A; contains (“) sample points. The 
conditional probability 


Py{x | Aj} = _ = (") 7 ifx € Aj, 


and we see that {is a sufficient partition. 


Example 17. Let X\,X2,...,X, be iid U[0,6] RVs. Consider the statistic T(X) = 
max|<j<,X;. The space of values of X),X2,...,X;, is the set of points {x:0<x <9, 
i=1,2,...,n}. T induces a partition { on this set. The sets of this partition are A; = { (x1, 
X2,.++,Xp) :max(x1,...,%,) =t}, t € [0,6]. 


We have 
fo(x) 
fo(x | t) = if x EA), 
=F | 
where fy (ft) is the PDF of 7. We have 
1/0" 1 
folx|) =F = e--xea, 


nt?—! /6" nt?—! 
It follows that £ = {A,} defines a sufficient partition. 


Remark 8. Clearly a sufficient statistic T for a family of DFs {F9: 6 € ©} induces a 
sufficient partition and, conversely, given a sufficient partition, we can define a sufficient 
statistic (not necessarily uniquely) for the family. 


Remark 9. Two statistics T, Tz that define the same partition must be in one-to-one cor- 
respondence, that is, there exists a function such that T; = h(T>) with a unique inverse, 
TIh= aaa OER It follows that if 7; is sufficient every one-to-one function of 7; is also 
sufficient. 


Let Lt, , LL, be two partitions of a space X. We say that Lt is a subpartition of Uy if every 
partition set in Lj is a union of sets of L;. We sometimes say also that Ll is finer than 
Lb (Lb is coarser than {t,) or that Lb is a reduction of U,. In this case, a statistic T> that 
defines £1, must be a function of any statistic T, that defines L{,. Clearly, this function need 
not have a unique inverse unless the two partitions have exactly the same partition sets. 

Given a family of distributions {F,: 0 € QO} for which a sufficient partition exists, we 
seek to find a sufficient partition £ that is as coarse as possible, that is, any reduction of L 
leads to a partition that is not sufficient. 
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Definition 5. A partition Lis said to be minimal sufficient if 


(i) Lis a sufficient partition, and 
(ii) if @ is any sufficient partition, C is a subpartition of L. 


The question of the existence of the minimal partition was settled by Lehmann and 
Scheffé [65] and, in general, involves measure-theoretic considerations. However, in the 
cases that we consider where the sample space is either discrete or a finite-dimensional 
Euclidean space and the family of distributions of X is defined by a family of PDFs (PMFs) 
{fo,0 € O} such difficulties do not arise. The construction may be described as follows. 

Two points x and y in the sample space are said to be likelihood equivalent, and we 
write x ~ y, if and only if there exists a k(y,x) 4 0 which does not depend on 6 such that 
fo(y) = k(y,x)fo(x). We leave the reader to check that “~” is an equivalence relation 
(that is, it is reflexive, symmetric, and transitive) and hence “~” defines a partition of the 
sample space. This partition defines the minimal sufficient partition. 


Example 18. Consider again Example 16. Then 


fy(X) Sui-V yi —Sxt+Y yi 
2 ee i ee: Matai 
Sly) ~P) 


and this ratio is independent of p if and only if 
dox= De 
l fl 


so that x ~ y if and only if $7} x; = -} yi. It follows that the partition {= {Ao,Aj,...,An}, 
where x € A; if and only if )>} x; =j, introduced in Example 16 is minimal sufficient. 


A rigorous proof of the above assertion is beyond the scope of this book. The basic 
ideas are outlined in the following theorem. 


Theorem 3. The relation “~” defined above induces a minimal sufficient partition. 


Proof. If T is a sufficient statistic, we have to show that x ~ y whenever T(x) = T(y). 
This will imply that every set of the minimal sufficient partition is a union of sets of the 
form A, = {T = ft}, proving condition (ii) of Definition 5. 

Sufficiency of T means that whenever x € A,;, then 


flx|T=4 = 


is free of @. It follows that if both x and y € A;, then 


So(x|t) _ folx) 
foly|t)  foly) 


is independent of 6, and hence x ~ y. 


if xX EA, 
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To prove the sufficiency of the minimal sufficient partition U, let T, be an RV that 
induces Ll. Then 7; takes on distinct values over distinct sets of L{ but remains constant on 
the same set. If x € {7; = 1}, then 


fo(x) 


fe BT ay 


(7) 
Now 


PT Sh} = foly)dy — or S- fol), 


(y:Ti(y)=n) (y:Ti(y)=n) 


depending on whether the joint distribution of X is absolutely continuous or discrete. Since 
fo(x)/fo(y) is independent of 6 whenever x ~ y, it follows that the ratio on the right-hand 
side of (7) does not depend on @. Thus T; is sufficient. 


Definition 6. A statistic that induces the minimal sufficient partition is called a minimal 
sufficient statistic. 


In view of Theorem 3 a minimal sufficient statistic is a function of every sufficient 
statistic. It follows that if 7; and T> are both minimal sufficient, then both must induce the 
same minimal sufficient partition and hence 7; and T> must be equivalent in the sense that 
each must be a function of the other (with probability 1). 

How does one show that a statistic T is not sufficient for a family of distributions P? 
Other than using the definition of sufficiency one can sometimes use a result of Lehmann 
and Scheffé [65] according to which if T; (X) is sufficient for 0, 8 € O, then T>(X) is also 
sufficient if and only if T,(X) = g(T2(X)) for some Borel-measurable function g and all 
x € B, where B is a Borel set with PgB = 1. 

Another way to prove T nonsufficient is to show that there exist x for which T(x) = 
T(y) but x and y are not likelihood equivalent. We refer to Sampson and Spencer [98] for 
this and other similar results. 

The following important result will be proved in the next section. 


Theorem 4. A complete sufficient statistic is minimal sufficient. 


We emphasize that the converse is not true. A minimal sufficient statistic may not be 
complete. 


Example 19. Suppose X ~ U(@,@+ 1). Then X is a minimal sufficient statistic. However, 
X is not complete. Take for example g(x) = sin27x. Then 


+1 1 
Eg(X) = / sin2ax dx = | sin2ax dx = 0. 
6 0 


for all @ and it follows that X is not complete. 
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If X,,X2,...,X, is asample from U(0,0+ 1), then (X(1),X,)) is minimal sufficient for 
@ but not complete since 


n—-1 


E6(X(ny —X(1)) = a 


for all 0. 


Finally, we consider statistics that have distributions free of the parameter(s) 0 and 
seem to contain no information about @. We will see (Example 23) that such statistics can 
sometimes provide useful information about 0. 


Definition 7. A statistic A(x) is said to be ancillary if its distribution does not depend on 
the underlying model parameter 0. 


Example 20. Let X\,X2,...,X, be a random sample from N(,1). Then the statistic 
A(X) = (n—1)S? = )Y_, (X; —X)? is ancillary since (n — 1)S? ~ y?(n— 1) which is 
free of uw. Some other ancillary statistics are 


Xi —X,X(a) —Xcy, 9 Kil. 


i=1 


Also, X, a complete sufficient statistic (hence minimal sufficient) for jz is independent 
of A(X). 


Example 21. Let X\,X2,...,Xn be a random sample from N(0,07). Then, A(X) = X 


follows a N(0,n~!o7) and not ancillary with respect to the parameter 07. 


Example 22. Let X(1),X(2),---,X(n) be the order statistics of a random sample from the 
PDF f(x — 0), where 0 € R. Then the statistic A(X) = (X(2) — Xqy,---X(n) — X(1)) is 


ancillary for 0. 


In Example 20 we saw that S? was independent of the minimal sufficient statistic X. 
The following result due to Basu shows that it is not a mere coincidence. 


Theorem 5. If S(X) is a complete sufficient statistic for 0, then any ancillary statistic 
A(X) is independent of S. 


Proof. YA is ancillary, then Pp {A(X) < a} is free of 6 for all a. Consider the conditional 
probability g,(s) = P{A(X) <a] S(X) =s}. Clearly 


Eo{8a(S(X)) } = Po{A(X) < a}. 
Thus 


Eo(Sa(S) — P{A(X) < a}) =0 
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for all 6. By completeness of S it follows that 
Po{8a(S) —P{A <a} =O} = 1, 
that is , 
Po {A(X) <a| S(X) = s} = P{A(X) <a}, 
with probability 1. Hence A and S are independent. 


The converse of Basu’s Theorem is not true. A statistic S that is independent of every 
ancillary statistic need not be complete (see, for example, Lehmann [62]). 

The following example due to R.A. Fisher shows that if there is no sufficient statis- 
tic for 6, but there exists a reasonable statistic not independent of an ancillary statistic 
A(X), then the recovery of information is sometimes helped by the ancillary statistic via 
a conditional analysis. Unfortunately, the lack of uniqueness of ancillary statistics creates 
problems with this conditional analysis. 


Example 23. Let X,X2,...,X;, be arandom sample from an exponential distribution with 
mean @, and let Y,, Y2,..., Y, be another random sample from an exponential distribution 
and mean 1 /@. Assume X’s and Y’s are independent and consider the problem of estimation 
of 6 based on the observations (X1,X2,...,Xnj Yi, Y2,---, Yn). Let Si(x) = S0"_, x; and 


i=l 
So(y) = 1 yi. Then (S1(X),S2(¥)) is jointly sufficient for 0. It is easily seen that 
(S,,S2) is a minimal sufficient statistic for 0. 
Consider the statistics 
1/2 
S(X, Y) = (S)(X)/S2(Y)) 
and 


A(X, Y) = $1(X)S2(Y). 


Then the joint PDF of S and A is given by 


Tap Aen (G+ see) fee a 


and it is clear that S and A are not independent. The marginal distribution of A is given by 
the PDF 


C(x, y)IA(x,y)]", 


where C(x,y) is the constant of integration which depends only on x,y, and n but not 
on 0. In fact, C(x, y) = 4Ko[2A(x, y)|/[I'(n)]?, where Ko is the standard form of a Bessel 
function (Watson [116]). Consequently A is ancillary for 0. 
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Clearly, the conditional PDF of S given A = ais of the form 


TAPER] ep : Ga . Te >) } : 


The amount of information lost by using S(X, Y) alone is (sq5c7)th part of the total and 


this loss of information is gained by the knowledge of the ancillary statistic A(X, Y). 
These calculations will be discussed in Example 8.5.9. 


PROBLEMS 8.3 


1. Find a sufficient statistic in each of the following cases based on a random sample 
of size n: 
(a) X ~ B(a, 8) when (i) a is unknown, 3 known; (ii) 3; is unknown, a known; and 
(iii) a, 8 are both unknown. 
(b) X ~ G(a, 3) when (i) a is unknown, ( known; (ii) 6 is unknown, a known; and 
(iti) a, 6 are both unknown. 
(c) X ~ Py, ny (x), where 


1 


Py, N> = ——_, =N,+1,N,+2,...,No, 
Ni ,N> (X) MN, x 1+ 1 2 
and N,,N2(N, < N2) are integers, when (i) N; is known, N> unknown; (ii) N> 


known, N; unknown; and (iii) N;,N> are both unknown. 
(d) X ~ fo(x), where 


ertt? «if <4 <6, 
0 otherwise. 


(e) X~ f(x; 1,0), where 


1 1 
X3[,0) = ex logx ee ek) 
f(x 1,0) aN = P{ 552 (los we 


(f) X ~ fo(x), where 
fa(x) = Po{X =x} =c(0)2-/*,  x=0,04+1,...,0>0 
and 


c(9) = 2'-1/8 (21/8 _ 1), 


(g) X ~ Po»(x), where 


Po p(x) =(1—p)p*®, x=0,0+1,..., O0<p<l, 
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10. 


11. 
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when (i) p is known, @ unknown; (ii) p is unknown, @ known; and ( iii) p, 6 are 
both unknown. 


. Let X = (X,,X2,...,X,) be a sample from N(ao,c7), where a is a known real 


number. Show that the statistic T(X) = (30, Xi, )/_, X?) is sufficient for o but 
that the family of distributions of T(X) is not complete. 


. Let X1,X2,...,X, be a sample from N(j,07). Then X = (X1,X2,...,Xn) is clearly 


sufficient for the family N(1,07), € R,o > 0. Is the family of distributions of X 
complete? 


. Let X),X2,...,X, be a sample from U(@ — 1 @ + 5), 6 € R. Show that the statistic 


T(X,..-,Xn) = (minX;,maxX;) is sufficient for @ but not complete. 


. If T = g(U) and T is sufficient, then so also is U. 
. In Example 14 show that the class of all functions g for which Epg(X) = 0 for all 


P © P? consists of functions of the form 


0, k=1,2,...,29 —1, m9 +2, 9 +3,..., 
g(k) = c, k=no, 


—c, k=no+1, 


where c is a constant. 


. For the class {F9,, Fo, } of two DFs where Fg, is N(0,1) and Fo, is C(1,0), find a 


sufficient statistic. 


. Consider the class of hypergeometric probability distributions {Pp : D = 


0, 1,2,...,N}, where 


Pp{X =x} = (*) () (N72). x=0,1,...,min{n,D}. 


Xx n—-xX 


Show that it is a complete class. If P = {Pp :D=0,1,2,...,N, D4 d, d integral 0 < 
d <N}, is P complete? 


. Is the family of distributions of the order statistic in sampling from a Poisson 


distribution complete? 

Let (X,,X2,...,X,) be a random vector of the discrete type. Is the statistic 
T(X\,...,Xn) = (X1,.--,Xn—1) sufficient? 

Let X,,X2,...,X, be a random sample from a population with law £(X). Find a 
minimal sufficient statistic in each of the following cases: 

(a) X ~ P(A). 

(b) X ~ U(0, 6]. 

(c) X ~ NB(1;p). 

(d) X ~ Py, where Py{X =k} = 1/N if k = 1,2,...,N, and = 0 otherwise. 

(e) X~N(1,07). 

(f) X~ G(a,8). 

(g) X~ Bla, B). 

(h) X ~ fo(x), where f(x) = (2/07)(0 —x),0<x <9. 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


8.4 


Let X,,X> be a sample of size 2 from P(X). Show that the statistic X, + aX, where 
qa > | is an integer, is not sufficient for X. 


Let X,,X2,...,X, be a sample from the PDF 


x —x" /20 if 0 
_ J) ge wx> 0 
= > 0. 
Fol) fi ifx <0 


Show that >", X? is a minimal sufficient statistic for 0, but )~’"_ , X; is not sufficient. 
Let X;,X>,...,X, be a sample from N(0,07). Show that 5>"_,X? is a minimal 
sufficient statistic but )~’"_, X; is not sufficient for 07. 

Let X),X2,...,X, be a sample from PDF fy, g(x) = Be~°°-™ if x > a, and = 0 if 
x <a. Find a minimal sufficient statistic for (a, 3). 

Let T be a minimal sufficient statistic. Show that a necessary condition for a 
sufficient statistic U to be complete is that U be minimal. 

Let X),X,...,X, be iid N(1, 07). Show that (X, S”) is independent of each of (X(ny — 
Xc1y)/S, (Xn —X)/S, and S77) (X41 —Xj)?/S?. 

Let X1,X2,...,X, be iid N(0, 1). Show that a necessary and sufficient condition for 
yy, aiX; and $~"_, X; to be independent is )~"_, a; = 0. 

Let X,X2,...,X, be a random sample from fg(x) = exp{—(x— 0) }, x > 6. Show 
that X() is a complete sufficient statistic which is independent of eS. 

Let X,,Xo,...,X, be iid RVs with common PDF fo(x) = (1/0)exp(—x/6), 
x > 0, 8 > 0. Show that X must be independent of every scale-invariant statistic 
such as X;/>7_, Xj. 

Let T,, 7 be two statistics with common domain D. Then 7; is a function of 7> if 
and only if 


forallx,ye€D, T(x) =T\(y) To(x) = T>(y). 


Let S' be the support of fg, 9 € O and let T be a statistic such that for some 4), 62 € O, 


and x,y € S,x Ay, T(x) = T(y) but fo, (x)fo, (vy) F fo, (x)fo, (y). Then show that T is 
not sufficient for 6. 


Let X),X2,...,X,, be iid N(0, 1). Use the result in Problem 22 to show that (77 x 
is not sufficient for 0. 


(a) If T is complete then show that any one-to-one mapping of T is also complete. 


(b) Show with the help of an example that a complete statistic is not unique for a 
family of distributions. 


UNBIASED ESTIMATION 


In this section we focus attention on the class of unbiased estimators. We develop a 
criterion to check if an unbiased estimator is optimal in this class. Using sufficiency 
and completeness, we describe a method of constructing uniformly minimum variance 
unbiased estimators. 
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Definition 1. Let {Fg, 9 € O}, O C R, be a nonempty set of probability distributions. 
Let X = (X1,Xo,...,X,) be a multiple RV with DF Fg and sample space X. Let 7): O > R 
be a real-valued parametric function. A Borel-measurable function T : X — O is said to 
be unbiased for w if 


EoT(X)=w(0) forallOcoO. (1) 


Any parametric function 7 for which there exists a 7 satisfying (1) is called an 
estimable function. An estimator that is not unbiased is called biased, and the function 
b(T,~), defined by 


b(T,) = EoT(X) — (9), (2) 
is called the bias of T. 


Remark 1. Definition 1, in particular, requires that Eg|T| < co for all 8 € © and can be 
extended to the case when both w and T are multidimensional. In most applications we 
consider 0 CR), (0) =O, and X,,X2,...,X, are iid RVs. 


Example 1. Let X,,X2,...,X, be a random sample from some population with finite 
mean. Then X is unbiased for the population mean. If the population variance is finite, the 
sample variance S? is unbiased for the population variance. In general, if the kth population 
moment m, exists, the kth sample moment is unbiased for mz. 

Note that S is not, in general, unbiased for oc. If X,,X2,...,X, are iid N(p, a”) RVs we 
know that (n — 1)S?/o? is x7(n — 1). Therefore, 


1 . 
E(SVn—1/o = (n—1)/2-1,-x/2 g 
( n / ) i V* GDL aa n/a e ix. 


a OrCE)] 
coef SOEC OT) 


The bias of S is given by 


HS.0)=0f 2) RS) i. 


We note that b(s,a) — 0 as n + oo so that S is asymptotically unbiased for o. 


If T is unbiased for 6, g(7) is not, in general, an unbiased estimator of g(@) unless g is 
a linear function. 


Example 2. Unbiased estimators do not always exist. Consider an RV with PMF b(1,p). 
Suppose that we wish to estimate 7(p) =p”. Then, in order that T be unbiased for p?, we 
must have 


p =E,T=pT(1)+(1-p)T(0), = O< p<, 
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that is, 


p’ = p{T(1)—T(0)} + 7(0) 


must hold for all p in the interval [0, 1], which is impossible. (If a convergent power series 
vanishes in an open interval, each of the coefficients must be 0. See also Problem 1.) 


Example 3. Sometimes an unbiased estimator may be absurd. Let X be P(A), and 
w(A) = e7 >>. We show that T(X) = (—2)* is unbiased for ¢)(A). We have 


_ = x aw _ = (—2A)* — 
E\T(X) =e \(-2) a =e oo — =e Ae? — wd). 
x=0 x=0 
However, T(x) = (—2)* > Oifxis even, and < Oifxis odd, which is absurd since w(A) > 0. 


Example 4. Let X,,X2,...,X, be a sample from P(A). Then X is unbiased for \ and so 
also is S*, since both the mean and the variance are equal to X. Indeed, ax + (1- a)S?, 
0<a< 1, is unbiased for X. 


Let @ be estimable, and let T be an unbiased estimator of 9. Let T; be another unbiased 
estimator of 6, different from T. This means that there exists at least one 0 such that Pp {T # 
T,} > 0. In this case there exist infinitely many unbiased estimators of 6 of the form 
aT + (1—a)T, 0 < a < 1. It is therefore desirable to find a procedure to differentiate 
among these estimators. 


Definition 2. Let 4) € © and U() be the class of all unbiased estimators T of 0) such 
that Eg, T* <0. Then Ty € UA) is called a locally minimum variance unbiased estimator 
(LMVUE) at 9p if 

Ey (To — 9)? < Eoy(T — 0)” (3) 
holds for all T € U(0o). 
Definition 3. Let U be the set of all unbiased estimators T of 9 € © such that EgT? < co 


for all 9 € O. An estimator Tp € U is called a uniformly minimum variance unbiased 
estimator (UMVUE) of 0 if 


Eo(Ty — 0)" < Ey(T—0)° (4) 
for all 6 € © and every T € U. 
Remark 2. Let aj,a2,...,d, be any set of real numbers with S~’_,a; = 1. Let 
X\,X2,...,X, be independent RVs with common mean yp and variances Ge, k=1,2,...,n. 


Then T = )>\_,a;X; is an unbiased estimator of with variance }~7_,a?0? (see 


Theorem 4.5.6). T is called a linear unbiased estimator of js. Linear unbiased estimators 
of js that have minimum variance (among all linear unbiased estimators) are called best 
linear unbiased estimators (BLUEs). In Theorem 4.5.6 (Corollary 2) we have shown that, 
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if X; are iid RVs with common variance o7, the BLUE of ps is X =n! S~"_, X;. If X; are 
independent with common mean y but different variance 7, the BLUE of y is obtained 
if we choose a; proportional to 1 / a, then the minimum variance is H/n, where H is the 


harmonic mean of G7, ...,0% (see Example 4.5.4). 


I hn 


Remark 3. Sometimes the precision of an estimator T of parameter @ is measured by the 
so-called mean square error (MSE). We say that an estimator To is at least as good as any 
other estimator T in the sense of the MSE if 


Eo(To —9)° <Eo(T—0) _—forall@cO. (5) 
In general, a particular estimator will be better than another for some values of # and worse 


for others. Definitions 2 and 3 are special cases of this concept if we restrict attention only 
to unbiased estimators. 


The following result gives a necessary and sufficient condition for an unbiased 
estimator to be a UMVUE. 


Theorem 1. Let U be the class of all unbiased estimators T of a parameter 9 € O with 
EoT” < o for all 0, and suppose that U is nonempty. Let Uo be the class of all unbiased 
estimators v of 0, that is, 


Up = {v: Egv =0, Egv’ <0o for all 8 € O}. 
Then 7p € U is a UMVUE if and only if 
Eo(vTo) =0 for all 6 and all v € Up. (6) 
Proof. The conditions of the theorem guarantee the existence of Eg(vTo) for all 6 and 
v € Uo. Suppose that Ty € U is a UMVUE and Ep,(voTp) 4 0 for some 4 and some 


vo € Up. Then Ty + Avo € U for all real X. If E@,Vo = 0, then Eg,(voT) = 0 must hold 
since Pg, {vp = 0} = 1. Let Eg,vh > 0. Choose Ay = —Eg,(Tovo)/Ea,v. Then 


2 2 Ej, (voTo) 2 
Ea,(To + dovo)? = EapTo — “5 — < Ea, 73. (7) 
Eo,V 


Since Tp + Aovo € U and To € U, it follows from (7) that 
val g, (To + Aovo) < var, (To), (8) 
which is a contradiction. It follows that (6) holds. 
Conversely, let (6) hold for some 7p € U, all 0 € © and all v € Up, and let T € U. Then 
To — T € Up, and for every 0 


EATAIp—T =O. 
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We have 
ET = Eo(TTo) < (EoT3)'/?(EgT?)'/” 


by the Cauchy—Schwarz inequality. If Eg ie = 0, then P(T) = 0) = | and there is nothing 
to prove. Otherwise 


(EoT5)'/? < (EeT?)'? 
or varg(To) < varg(T). Since T is arbitrary, the proof is complete. 


Theorem 2. Let U be the nonempty class of unbiased estimators as defined in Theorem 1. 
Then there exists at most one UMVUE for 0. 


Proof. If T and Ty € U are both UMVUEs, then T — Ty € Up and 
Eo{To(T —To)} =0 for all 0 € O, 
that is, EgT, = E(TTo), and it follows that 
cov(T,T>) = varg(To) for all 0. 


Since To and T are both UMVUEs varg (7) = varg(To), and it follows that the correlation 
coefficient between T and Tp is 1. This implies that Pp {aT + bT) = 0} = 1 for some a, b 
and all 6 € ©. Since T and Tp are both unbiased for 6, we must have Pg{T = To} = 1 
for all 6. 


Remark 4. Both Theorems | and 2 have analogs for LMVUE’s at 09 € ©, Oo fixed. 


Theorem 3. If UMVUEs 7; exist for real functions w;, i= 1,2, of 0, they also exist for 
Av; (A real), as well as for W; + w2, and are given by AT; and T; + T», respectively. 


Theorem 4. Let {7,,} be a sequence of UMVUEs and T be a statistic with EgT? < 00 
such that Eg{T, —T}? — 0 as n — oo for all 9 € ©. Then T is also the UMVUE. 


Proof. That T is unbiased follows from |EgT — 6| < Eg|T —T,| < By it, —T}*. For all 
v € Up, all 0, and every n = 1,2,..., 


Eo (Tnv) =0 
by Theorem |. Therefore, 


Eo(vT) => Eo(vT) — Eg (vT;) 
= Eo{v(T —Tn)| 


and 


|Eo(vT)| < (Eov’)!/?[Eg(T—T)?]'/2 +0  asn— oo 
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for all 6 and all v € U. Thus 

Eg(vT) =0 forallveE Up, all@cO, 
and, by Theorem 1, T must be the UMVUE. 


Example 5. Let X;,X2,...,Xn be iid P(A). Then X is the UMVUE of X. Surely X is unbi- 
ased. Let g be an unbiased estimator of 0. Then T(X) = X + g(X) is unbiased for 9. But 
X is complete. It follows that 


Eyg(X) =0 for all A>0= g(x) =0 forx=0,1,2,.... 
Hence X must be the UMVUE of X. 


Example 6. Sometimes an estimator with larger variance may be preferable. 

Let X be a G(1, 1/8) RV. X is usually taken as a good model to describe the time to 
failure of a piece of equipment. Let X1, X2,...,X,, be a sample of n observations on X. Then 
X is unbiased for EX = 1/ with variance 1/(n7). (X is actually the UMVUE for 1/6.) 
Now consider X(,) = min(X),X2,...,X;,). Then nX(,) is unbiased for 1/( with variance 
1/ B’, and it has a larger variance than X. However, if the length of time is of importance, 
nX(n,) may be preferable to X, since to observe nX (,) one needs to wait only until the first 
piece of equipment fails, whereas to compute X one would have to wait until all the n 
observations X1,X2,...,X, are available. 


Theorem 5. If a sample consists of m independent observations X,,X2,...,X, from the 
same distribution, the UMVUE, if it exists, is a symmetric function of the X;’s. 


Proof. The proof is left as an exercise. 


The converse of Theorem 5 is not true. If X|,X2,...,X, are iid P(A) RVs, \ > 0, both 
X and S* are unbiased for 6. But X is the UMVUE, whereas S? is not. 
We now turn our attention to some methods for finding UMVUE’s. 


Theorem 6. (Blackwell [10], Rao [87]). Let {F9: 6 € ©} be a family of probability DFs 
and h be any statistic in U, where U is the (nonempty) class of all unbiased estimators 
of 0 with Egh? < oo. Let T be a sufficient statistic for {Fg,9 € ©}. Then the conditional 
expectation Eg {h | T} is independent of 0 and is an unbiased estimator of 6. Moreover, 


Eg(E{h|T}—0)? <Eg(h—0)? _— forall@c O. (9) 


The equality in (9) holds if and only if h = E{h | T} (that is, Pp{h = E{h| T}} =1 
for all @). 


Proof. We have 


Eo{E{h| T}} = Egh = 9. 
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It is therefore sufficient to show that 
Eg{E{h|T}? < Eph? forall € 0. (10) 

But Egh? = Eg{Efh* | T}}, so that it will be sufficient to show that 

[E{h| T}P < E{h? | T}. (1) 
By the Cauchy—Schwarz inequality 

E{h| T} < Efi | T}E{1 | T}, 

and (11) follows. The equality holds in (9) if and only if 

Eo[E{h| T}) =Eoh’, (12) 
that is, 

E9(E(i? | T} — E°{h| T}] =0, 
which is the same as 
Eo{var{h | T}} =0. 

This happens if and only if var{h | T} = 0, that is, if and only if 

E{h? |T} = E°{h|T}, 
as will be the case if and only if / is a function of T. Thus A = E{h | T} with probability 1. 

Theorem 6 is applied along with completeness to yield the following result. 
Theorem 7. (Lehmann-Scheffé [65]). If T is a complete sufficient statistic and there exists 
an unbiased estimator h of 0, there exists a unique UMVUE of 6, which is given by 
E{h|T}. 
Proof. Vf hy,h2 € U, then E{h, | T} and E{hp | T} are both unbiased and 
Eo{E{h, | T}—Ef{h, | T}] =9, for all 0 € O. 


Since T is a complete sufficient statistic, it follows that E{h,; | T} = E{ho | T}. By 
Theorem 6 E{h | T} is the UMVUE. 


Remark 5. According to Theorem 6, we should restrict our search to Borel-measurable 
functions of a sufficient statistic (whenever it exists). According to Theorem 7, if a com- 
plete sufficient statistic T exists, all we need to do is to find a Borel-measurable function 
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of T that is unbiased. If a complete sufficient statistic does not exist, an UMVUE may still 
exist (see Example 11). 


Example 7. Let X),X2,...,X, be N(0, 1). X; is unbiased for 0. However, X = n~! S~) X; 
is a complete sufficient statistic, so that E{X, | X} is the UMVUE. 
We will show that E{X, | X} =X. Let Y =nX. Then Y is N(n6,n), X, is N(0,1), and 


: ae : ‘ ; fl A 
(X,Y) is a bivariate normal RV with variance covariance matrix . Therefore, 
lon 


cov(X), Y) 
var(Y) 


1 
nN n 


E{X, |y} =EXi+ (y—EY) 


as asserted. 

If we let (0) = 67, we can show similarly that en 1 /nis the UMVUE for ¢)(0). Note 
that ® —1 /n may occasionally be negative, so that an UMVUE for 6” is not very sensible 
in this case. 


Example 8. Let X;,X>,...,Xn be iid b(1,p) RVs. Then T = 57) X; is a complete sufficient 
statistic. The UMVUE for p is clearly X. To find the UMVUE for w(p) = p(1 —p), we 
have E(nT) = np, ET? = np +n(n—1)p?, so that E{nT — T?} = n(n—1)p(1—p), and it 
follows that (nT — T”) /n(n—1) is the UMVUE for 2(p) = p(1 —p). 


Example 9. Let X;,X,...,X, be a sample from N(j,07). Then (X,S7) is a complete 
sufficient statistic for (11,07). X is the UMVUE for ju, and S? is the UMVUE for 0”. Also 
k(n)S is the UMVUE for o, where k(n) = \/[(n — 1) /2] T[(n— 1)/2]/T (1/2). We wish to 
find the UMVUE for the pth quantile 3,. We have 


n=Pitcy)=P{ocdt}, 
oO 


where Z is N(0, 1). Thus 3, = 0z,_, +, and the UMVUE is 
T(X1,X2,...,Xn) = Z1—pk(n)S+X. 


Example 10. (Stigler [110]). We return to Example 14. We have seen that the family 
{Pp ; N > I} of PMFs of X(,) = maxi<i<nX; is complete and X(,) is sufficient for 
N. Now EX, = (N + 1)/2, so that T(X;) = 2X; — 1 is unbiased for N. It follows from 
Theorem 7 that E{7(X;) | X(,)} is the UMVUE of N. We have 


if x, =1,2,...,y—1, 
P{X| =X) | Xn) =y}= 


xi=y. 
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Thus 
n—1 n—1 yl n—1 
yy" -(y-l y 
E{T(%) |Xjy =y} =~ —— 9" (an, -1) 4 Qy-1)— : 
yoyo (y—1) 
= yitl _ (y— Lert 
yal 


is the UMVUE of N. 

If we consider the family P instead, we have seen (Example 8.3.14 and Problem 8.3.6) 
that P is not complete. The UMVUE for the family {Py : N > 1} is T(X,) = 2X, — 1, 
which is not the UMVUE for ?. The UMVUE for ? is in fact, given by 


T\(k) = 2k—1, k#no, kAnot+l, 
' ji. =H, Kang, 


The reader is asked to check that 7; has covariance 0 with all unbiased estimators g of 0 
that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence Theorem 1 
implies that T, is the UMVUE. Actually T;(X1) is a complete sufficient statistic for P. 
Since E,,,T)(X1) =o +1/no, T; is not even unbiased for the family {Py : N > 1}. The 
minimum variance is given by 

vary(T(X1)) if N < no, 


vary (T\(X,)) = 2 
(Ti (X1)) vaty(T(X1)) = 5 if N > no. 


The following example shows that UMVUE may exist while minimal sufficient statistic 
may not. 


Example 11. Let X be an RV with PMF 
Po(X = —1) = 6 and Po(X =x) = (1-0), 


x =0,1,2,..., where 0 < 6 < 1. Let (0) = Pg(X =0) = (1— 6). Then X is clearly 
sufficient, in fact minimal sufficient, for @ but since 


EoX = (104+ Sox —0)P oe 


x=0 


= —+0(1-0) 6" =0, 
x=1 


it follows that X is not complete for {Pg : 0 < @ < 1}. We will use Theorem | to check if 
a UMVUE for 7(@) exists. Suppose 


Egh(X) = n(—1)0+ (1 —0)6°h(x) =0 
x=0 
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for all 0 < 6 < 1. Then, for0 <4 <1, 


0=0h(-1)+S_ 6 A(x)—25 6 'h(x) + SP A(x) 
x=0 x=0 x=0 


= n(0)+ 26 ax + 1) —2h(x) +h(x—1)] 
x=0 


which is a power series in 0. 
It follows that 4(0) = 0, and for x > 1, A(x+ 1) — 2h(x) + h(x— 1) = 0. Thus 
h(1) =A(—1), h(2) = 2h(1) —A(0) = 2h(-1), 
h(3) = 2h(2) —h(1) = 4h(—1) —h(-1) = 3A(-1), 


and so on. Consequently, all unbiased estimators of 0 are of the form h(X) = cX. Clearly, 
T(X) = 1 if X = 0, and = 0 otherwise is unbiased for w(). Moreover, for all 0 


E{cX -T(X)} =0 
so that T is UMVUE of 7)(0). 
We conclude this section with a proof of Theorem 8.3.4. 
Theorem 8. (Theorem 8.3.4) A complete sufficient statistic is minimal sufficient statistic. 


Proof. Let S(X) be acomplete sufficient statistic for {fg : 9 € O} and let T be any statistic 
for which Eg|T?| < oo. Writing A(S) = Eo{T|S} we see that h is UMVUE of ET. Let 
S\(X) be another sufficient statistic. We show that A(S) is a function of S,. If not, then 
hy (S,) = Eg {h(S)|S1} is unbiased for EgT and by Rao—Blackwell theorem 


varg hy (S1) < var h(S), 


contradicting the fact that h(S) is UMVUE for EgT. It follows that h(S) is a function of S}. 
Since h and S$; are arbitrary, S must be a function of every sufficient statistic and hence, 
minimal sufficient. 


PROBLEMS 8.4 


1. Let X1,X2,...,X,(n > 2) be a sample from b(1,p). Find an unbiased estimator for 
w(p) =p. 

2. Let X,,X2,...,X,(n > 2) be a sample from N (1,07). Find an unbiased estimator for 
o?, where p+n > 1. Find a minimum MSE estimator of 0”. 

3. Let X;,Xz,...,X, be iid N(u,07) RVs. Find a minimum MSE estimator of the form 
aS* for the parameter 0”. Compare the variances of the minimum MSE estimator 
and the obvious estimator S?. 
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4. Let X ~ b(1, 67). Does there exist an unbiased estimator of 0? 

. Let X ~ P(A). Does there exist an unbiased estimator of 7(A) = A~!2 

6. Let X1,X2,...,X, be a sample from b(1,p),0 <p < 1, and 0 < s <n be an integer. 
Find the UMVUE for (a) ¢)(p) = p* and (b) w(p) = p*+ (1—p)"~. 

7. Let X|,X2,...,X, be asample from a population with mean 6 and finite variance, and 
T be an estimator of 6 of the form T(X1,Xo,...,Xn) = d>)_, iX;. If T is an unbiased 


estimator of @ that has minimum variance and 7” is another linear unbiased estimator 
of 6, then 


nn 


covg(T,T’) = varg(T). 


8. Let T,, 72 be two unbiased estimators having common variance aa7(a > 1), where 
oa” is the variance of the UMVUE. Show that the correlation coefficient between T; 
and T> is > (2—a)/a. 

9. Let X ~ NB(1;6@) and d(@) = Po{X = 0}. Let X1,X2,...,X, be a sample on X. Find 


the UMVUE of d(@). 
10. This example covers most discrete distributions. Let X;, X2,...,X, be a sample from 
PMF 
a(x)" 
Pyixaso——;.. #20120 
f(9) 
where 6 > 0, a(x) > 0,f(0) = 3029 a(x) 6*, a(0) = 1, and let T =X, +X. +---+Xn. 
Write 


c(t,n) = » [[ eG). 


MijN2so10 Ant 
with Soxj=t 
i=1 
Show that T is a complete sufficient statistic for 9 and that the UMVUE for d(0) = 6” 
(r > 0 is an integer) is given by 


0 ift<r 


ift>r. 


(Roy and Mitra [94]) 
11. Let X be a hypergeometric RV with PMF 


nee) |, 


where max(0,M+n—WN) <x < min(M,n). 
(a) Find the UMVUE for M when N is assumed to be known. 
(b) Does there exist an unbiased estimator of N (MV known)? 
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12. Let X,,X2,...,X, be iid G(1,1/A) RVs A > 0. Find the UMVUE of P){X, < to}, 
where fg > 0 is a fixed real number. 


13. Let X;,X>,...,X,, be a random sample from P(A). Let 7)(A) = 3725 c.\* be a para- 
metric function. Find the UMVUE for ~(A). In particular, find the UMVUE for 
(a) W(A) = 1/(1 — A), (b) (A) = »° for some fixed integer s > 0, (c) ~(A) = 
P){X=0}, and (d) W(A) = P,{X =0 or 1}. 

14. Let X,,X2,...,X;,, be a sample from PMF 


Let ~(N) be some function of N. Find the UMVUE of w(N). 

15. Let X1,X2,...,X, be a random sample from P(\). Find the UMVUE of (A) = 
P){X =k}, where k is a fixed positive integer. 

16. Let (X,Y), (X2, Y2),---,(Xn, Yn) be a sample from a bivariate normal population 
with parameters [u1, /J2, c, os, and p. Assume that ju) = f42 = py, and it is required 
to find an unbiased estimator of ju. Since a complete sufficient statistic does not exist, 
consider the class of all linear unbiased estimators 


f(a) =aX+(1—a)Y. 


(a) Find the variance of ju. 
(b) Choose a = ap to minimize var(u) and consider the estimator 


fio = agX + (- ao) Y. 


Compute var(ji9). If o; = 02, the BLUE of yu (in the sense of minimum 
variance) is 


~  X+Y 
M1 = sy 
irrespective of whether o, and p are known or unknown. 


(c) If o; # o2 and p,o1,02 are unknown, replace these values in ag by their 
corresponding estimators. Let 


P S3-Si 
(ay = a)a oc" 
Si + $5 — 25811 
Show that 

fi2 =Y+(X—-Y)a 


is an unbiased estimator of ju. 
17. Let X),X2,...,X, be iid N(0, 1). Let p = ®(x— @), where © is the DF of a N(0, 1) 
RV. Show that the UMVUE of p is given by ® ((x-3) a). 


n—-1 


UNBIASED ESTIMATION 371 


18. 
19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Prove Theorem 5. 


In Example 10 show that 7; is the UMVUE for AN (restricted to the family P), and 
compute the minimum variance. 


Let (X,,Y,),..-,(Xn,¥,) be a sample from a bivariate population with finite vari- 


ances oj and 03, respectively, and covariance y. Show that 


n—2 24 7193 
n—1l n—-1/)° 


1 
var(Si1) = ; (un 


where jx. = E[(X — EX)?(Y — EY)?]. It is assumed that appropriate order moments 
exist. 

Suppose that a random sample is taken on (X,Y) and it is desired to estimate y, 
the unknown covariance between X and Y. Suppose that for some reason a set S of 
n observations is available on both X and Y, an additional n,; —n observations are 
available on X but the corresponding Y values are missing, and an additional nz — n 
observations of Y are available for which the X values are missing. Let S; be the set 
of all n;(> n) X values, and Sp, the set of all n2(> 7) Y values, and write 


om Dies Xi = y- Dies Yi 
ny no n n 


Show that 
nn “ 


y= 5 X)(¥;-Y) 


n(nyn2 —ny —No +n 
(nin2 I aor roe 


is an unbiased estimator of y. Find the variance of 4, and show that var(4) < 
var(Sj1), where $1; is the usual unbiased estimator of yy based on the n observations 
in S (Boas [11]). 

Let X1,X2,...,X, be iid with common PDF fo (x) = exp(—x +0), x > 0. Let xo be a 
fixed real number. Find the UMVUE of fo (x0). 

Let X),X2,...,Xn be iid N(ju, 1) RVs. Let T(X) = S>y_, X;. Show that y(x;1/n,n— 
1/n) is UMVUE of (p(x; 14, 1) where (x; 4,07) is the PDF of a N(j,07) RV. 

Let X),X2,...,X, be iid G(1,0) RVs. Show that the UMVUE of f(x;@) = 
(1/0) exp(—x/0), x > 0, is given by h(x|t) the conditional PDF of X, given T(X) = 
1 Xi = 4, where 


h(x|t) = (n—1)(t—x)""?/t""! for x <t and =0 for x >t. 


Let X1,X2,...,X, be iid RVs with common PDF fo(x) = 1/(20), |x| < 0, and =0 
elsewhere. Show that T(X) = max{—X(1),X(1) } isa complete sufficient statistic for 
@. Find the UMVU estimator of 0”. 


Let X,,X2,...,X, be arandom sample from PDF 


fo(x) = (1/o) exp{—(x— #)/o}, x> wo > 0, 
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where @ = (tt, 0). 
(a) (Xi : baee (Xj - X)) is a complete sufficient statistic for 0. 
(b) Show that the UMVUEs of 1 and o are given by 


n 


; 1 n . 1 
=X) yD (8 Xa), e= >) &%-Xa)- 


j=! j=l 


(c) Find the UMVUE of w(}1,0) = Ey.oX1- 
(d) Show that the UMVUE of P(X, > r) is given by 


ee eae Gee ee 
~ n Di (Xj — Xa) 


where xt = max(x,0). 


8.5 UNBIASED ESTIMATION (CONTINUED): A LOWER BOUND FOR 
THE VARIANCE OF AN ESTIMATOR 


In this section we consider two inequalities, each of which provides a lower bound for 
the variance of an estimator. These inequalities can sometimes be used to show that an 
unbiased estimator is the UMVUE. We first consider an inequality due to Fréchet, Cramér, 
and Rao (the FCR inequality). 


Theorem 1. (Cramér [18], Fréchet [34], Rao [86]). Let O C & be an open interval and 
suppose the family {fg : 0 € O} satisfies the following regularity conditions: 


(i) It has common support set S. Thus S = {x : fo(x) > 0} does not depend on 6. 
(ii) For x € Sand @ € 0, the derivative 4 log f(x) exists and is finite. 


(iii) For any statistic h with Eg|h(X)| < oo for all 0, the operations of integration 
(summation) and differentiation with respect to 6 can be interchanged in Egh(X). 
That is, 


0 


Sp | Blfoloidx= f 120) flood (l) 


whenever the right-hand side of (1) is finite. 


Let T(X) be such that varg T(X) < oo for all @ and set W(0) = EgT(X). If 1(0) = 
Eo {Z log fo (x)} satisfies 0 < I(@) < 00 then 


varg T(X) > 


(2) 
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Proof. Since (111) holds for h = 1, we get 


0= f spfoloas 
= [{ Sroetots | foie 


0 
=| Soe f(x)}. (3) 


Differentiating (6) = EgT(X) and using (1) we get 


¥(0)= [ 700. Fyfolooax 


=| {709 5 lowsotx) | alsa 
= cov (7(), 5,108 so(X))). (4) 


Also, in view of (3) we have 


varg (FploeX) ) = Eo 55 lon fo(X x) 


and using Cauchy—Schwarz inequality in (4) we get 


a 2 
[WO)P < var T(X)Eo{ Solos f(x) } 
which proves (2). Practically the same proof may be given when fg is a PMF by replacing 
f by X. 
Remark 1. Tf, in particular, ~(0) = 6, then (2) reduces to 


varg(T(X)) > TOR (5) 


Remark 2. Let X,,X2,...,X, be iid RVs with common PDF (PMF) fo (x). Then 


10) = Ba { Mees a =e {Ee a 


nb { Mosel) = nhi(6), 


Alogfo(Xi) 7 7. ap: ; ; 
where 1; (0) = Eg {este} . In this case the inequality (2) reduces to 


[w'()P 


varg(T(X)) > nor 
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Definition 1. The quantity 


_ log fo(X1) 
1,(6) =Es{ fi (6) 
is called Fisher’s information in X; and 
dlog fg(X) 7 
1,(0) = Eo { _ 8) \ nl, (8) (7) 


is known as Fisher information in the random sample X,,X2,...,Xn. 


Remark 3. As n gets larger, the lower bound for varg(T(X)) gets smaller. Thus, as the 
Fisher information increases, the lower bound decreases and the “best” estimator (one for 
which equality holds in (2)) will have smaller variance, consequently more information 
about 0. 


Remark 4, Regularity condition (i) is unnecessarily restrictive. An examination of the 
proof shows that it is only necessary that (ii) and (iii) hold for (2) to hold. Condition (i) 
excludes distributions such as f(x) = (1/0), 0 <x < 0, for which (3) fails to hold. It also 
excludes densities such as fg(x) = 1,0<x<0+1, orfo(x) = 2 sin’(x+71), O6<x<04+7, 
each of which satisfies (iii) for 4 = 1 so that (3) holds but not (1) for all A with Eg|h| < oo. 


Remark 5. Sufficient conditions for regularity condition (111) may be found in most calcu- 
lus textbooks. For example if (i) and (ii) hold then (iii) holds provided that for all / with 


Eo\|h| < co for all 6 € O, both Ey n(X) Aeafece | and Eg a(x) G9 | are contin- 
uous functions of @. Regularity conditions (i) to (iii) are satisfied for a one-parameter 
exponential family. 


Remark 6. The inequality (2) holds trivially if 7(@) = co (and ~'(@) is finite) or if 
varg(T(X)) = co. 


Example 1. Let X ~ b(n,p); © = (0,1) C 8. Here the Fisher Information may be obtained 
as follows: 


log fp(x) = log (") +xlogp+(n—x)log(1—p), 
Olog fp (x) _*X_ nx 


Op p 1p’ 
and 
ei) n 
E = =I(p). 
»( dp p(i—p) ) 


Let w(p) be a function of p and T(X) be an unbiased estimator of y(p). The only condition 
that need be checked is differentiability under the summation sign. We have 


U(P) = EpT(X) = (") T(x)p\(1—p)", 


x=0 
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which is a polynomial in p and hence can be differentiated with respect to p. For any 
unbiased estimator T(X) of p we have 


and since 


oad ? 
n n 


eg (7) _ mp(l—p) _ p=p) 


it follows that the variance of the estimator X/n attains the lower bound of the FCR 
inequality, and hence T(X) has least variance among all unbiased estimators of p. Thus 
T(X) is UMVUE for p. 


Example 2. Let X ~ P(X). We leave the reader to check that the regularity conditions are 
satisfied and 


var,(T(X)) >A. 


Since T(X) = X has variance \, X is the UMVUE of A. Similarly, if we take a sample 
of size n from P(A), we can show that 


Ind) = 5 and var(T(X1,...,X»)) > 


sly 


and X is the UMVUE. 
Let us next consider the problem of unbiased estimation of y(A) = e~> based on a 
sample of size 1. The estimator 


m= {0 xt 
is unbiased for ¢)(A) since 
E)0(X) = E)[0(X)?? = P){X =0} =e". 
Also, 
var,(O(X)) =e (1-2). 
To compute the FCR lower bound we have 
log fy, (x) = xlogA— A—logx!. 


This has to be differentiated with respect to e—, since we want a lower bound for an 
estimator of the parameter e~>. Let 9 = e~*. Then 
1 


7 + log 6 —logx!, 


1 
=* ries) 6° 


log fo(x) = xlog log 


O 
5p efo(x) 
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and 
rs) a | 2 1 1 1 i? 
Eo { sabe o(X)} =p {+ gled + aalap (163+ (oe) )} 
i 
=e41-2450+%)} 
2r 
=F =e), 
so that 
T(X)> a = ! 
varg T( ) 2 OX = TeX) 
where 0 = e~. 


Since e~*(1—e7*) > Ae~* for A > 0, we see that var(d(X)) is greater than the lower 
bound obtained from the FCR inequality. We show next that 5(X) is the only unbiased 
estimator of @ and hence is the UMVUE. 

If A is any unbiased estimator of 6, it must satisfy Egh(X) = 0. That is, for all A > 0 


Co 


d 0 
e°= So n(kje~ me 
k=0 


Equating coefficients of powers of \ we see immediately that h(0) = 1 and h(k) = 0 for 
k =1,2,.... It follows that h(X) = 0(X). 

The same computation can be carried out when X1,X2,...,X;, is random sample from 
P(X). We leave the reader to show that the FCR lower bound for any unbiased estimator of 
=e is \e~*/n. The estimator )~"_, O(X;) /n is clearly unbiased for e~ > with variance 
e~*(1—e7*)/n > (Ae) /n. The UMVUE of e~* is given by Ty = (Oe le with 
var (To) = e~?(e*/" — 1) > (Ae~?4) /n for all X > 0. 


Corollary. Let X),X2,...,X, be iid with common PDF f(x). Suppose the family {fp : 


6 € O} satisfies the conditions of Theorem 1. Then equality holds in (2) if and only if, for 
alld cO, 


T(x) —¥(0) =K(0)- low folx) (6) 
for some function k(6). 


Proof. Recall that we derived (2) by an application of Cauchy—Schwatz inequality where 
equality holds if and only if (8) holds. 


Remark 7. Integrating (8) with respect to @ we get 


log fo(x) = Q(0)T(x) + S(0) + A(x) 


LOWER BOUND FOR THE VARIANCE 377 


for some functions Q,S, and A. It follows that fg is a one-parameter exponential family 
and the statistic T is sufficient for 0. 


Remark 8. A result that simplifies computations is the following. If fg is twice differen- 
tiable and Eg 4 log fo (x)} can be differentiated under the expectation sign, then 


2 


1(0) =F{ Fogo} =—Eg { ope oe fol}, (9) 


For the proof of (9), it is straightforward to check that 


0 _ fix) fa : 
Sp betta) 2 — 1 oesotx)} 


Taking expectations on both side we get (9). 
Example 3. Let X,X2,...,X, be iid N(y, 1). Then 


nas Ne 
log fulx) = —5log(2n) — @ 


0 
5 lost) =3— Be 


2 


0 
Dye 8 fu) st, 


Hence I(j1) = | and I,(j1) =n. 


We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK inequal- 
ity) that gives a lower bound for the variance of an estimator but does not require regularity 
conditions of the Fréchet-Cramér—Rao type. 


Theorem 2 (Chapman and Robbins [12], Kiefer [52]). Let © C and {fo(x) : 0 € O} 
be a class of PDFs (PMFs). Let w be defined on 0, and let T be an unbiased estimator 
of (0) with EgT < co for all 6 € O. If 6 A yp, assume that fy and f,, are different and 
assume further that there exists a y € © such that 0 ¥ y and 


S(8) = {fo(x) > 0} D S(p) = {fp(x) > OF. (10) 
Then 


varg(T(X)) > sup oe) — HO)? (11) 


~ {9:8(~)C5(8), #6} Valo tf (X)/fo(X)} 


forall 9 EQ. 
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Proof. Since T is unbiased for ~), E,T(X) = y)(y) for all y € ©. Hence, for y # A, 


/ 7(xy LL) 5, (9) dx = dy) — ¥(0), 2) 
S(0) fo(x) 
which yields 
HOO. 2) en. 
cove { TOK), FetE) — 1} = We) - 4) 


Using the Cauchy—Schwarz inequality, we get 


< varg(T(X)) varg ex 2 i} 


= varg(T(X)) varg ( 


Thus 


[w(y) — vO)? 
varg(T(X)) > varg{fo(X)/fo(X)}’ 


and the result follows. In the discrete case it is necessary only to replace the integral in the 
left side of (12) by a sum. The rest of the proof needs no change. 


Remark 9. Inequality (11) holds without any regularity conditions on fg or ¢)(0). We will 
show that it covers some nonregular cases of the FCR inequality. Sometimes (11) is avail- 


able in an alternative form. Let @ and 6+ 6(6 4 0) be any two distinct values in © such 
that S(0 +5) C S(@), and take 7)(6) = 0. Write 


1 X)\? 
T=I(0,8)= 5 { (42) i 


Then (11) can be written as 


varg(T(X)) > : 


—— i 
aa inf EoJ’ a 


where the infimum is taken over all 6 4 0 such that S(@+ 6) C S(0). 
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Remark 10. Inequality (11) applies if the parameter space is discrete, but the Fréchet— 
Cramér—Rao regularity conditions do not hold in that case. 


Example 4. Let X be U[0,0]. The regularity conditions of FCR inequality do not hold in 
this case. Let w(0) = 0. If py < 9, then S(y) C S(A). Also, 


eli} L(G) a8 


DI 
aS) 


Thus 


(p-0)? _ 6 
varg(T(X)) > Pie Cot Roe dee g)}= 7. 


for any unbiased estimator T(X) of 0. X is acomplete sufficient statistic, and 2X is unbiased 
for 0 so that T(X) = 2X is the UMVUE. Also 


oF 
varg(2X) = 4varX = z ar 


Thus the lower bound of 67/4 of the CRK inequality is not achieved by any unbiased 
estimator of 6. 


Example 5. Let X have PMF 


1 
Py{X =k} = 2 N’ 
0, otherwise. 


k=1,2,... 


2 


Let 0 = {N:N >M,M > | given}. Take 7)(N) = N. Although the FCR regularity 
conditions do not hold, (11) is applicable since, forN AN’ COCR, 


S(N) = {1,2,...,N}D S(N’) = {1,2,...,N’} if NM <N. 


Also, Py and Py: are different for N 4 N’. Thus 


(7) > sup N—NY 
var > sup ————~_.. 
” Nr<n Vaty{ Py: /Pw} 


Now 


@) = _ J ge PHU 2. ,N UN <N, 


Py(x) 0, otherwise, 


o(temy AS (2) <8 
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and 
var, Fal) = 456 for N > N’ 
| PaO) N’ 
It follows that 
(N—N’)? / / 
vary(T(X)) > sup —————— = sup N'(N-N’). 
wn ( ( )) ee (N— N’)/N’ oe ( ) 
Now 
k(N —k) N+1 
>1 if and only if k < ——_, 
(k—1)(N—k+1) if and only i 5 


so that N’(N — N’) increases as long as N’ < 


The maximum is achieved at NV’ = 


(N + 1)/2 and decreases if N’ 
[((N + 1)/2] if M < (N+1)/2 and at N’ 


> (N+1)/2. 
=MifM> 


(N + 1)/2, where [x] is the largest integer < x. Therefore, 


N+1 


vary(T(X)) 2 oS 7 {" 


and 


| 


=|} em sway 


vary(T(X)) >M(N—M) if M > (N+1)/2, 


Example 6. Let X ~ N(0,07). Let us compute J (see Remark 9) for 6 4 0. 


= 3 { (GR) 1} <3 [peel S| 
-4|(35) of Ee) | 
c= (5%) efoo(ES)}-B 


where c = (6? +206) /(a +6). 
Since )> X?/o0? ~ y(n) 


1 


ee 1 = 2n 
on a+6 


Let k = 6/o then 


_ 2k+ 


1 
1 fi z 
(1 — 2c)? OPES 


(ee) eee 
1-2c= 
© +k?” 
aka y= 1], 
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Here 1+k > 0 and 1 —2c > 0, so that 1 — 2k—k* > 0, implying —/2 <k+1< V2 and 
also k > —1. Thus —1 <k < V/2—1andk 40. Also, 


(1+k)-"(1 —2k-—#)-*/? -1 
ko? 


lim E,J = lim 
k-0 k0 

_ 2n 

ae 
by L’Hospital’s rule. We leave the reader to check that this is the FCR lower bound for 
var, (T(X)). But the minimum value of E,J is not achieved in the neighborhood of k = 0 


so that the CRK inequality is sharper than the FCR inequality. Next, we show that for 
n= 2 we can do better with the CRK inequality. We have 


1 1 
E,J = 1 
Ro? fase \ 


(k+2)? 
= 1<k<vV2-1, k#0. 
o2(1-+k)2(1 — 2k — 2)’ <k<v2—1, k#0 


For k = —0.1607 we achieve the lower bound as (E,J)~! = 0.269807, so that 
var, (T(X)) > 0.269807 > 07/4. Finally, we show that this bound is by no means the 
best available; it is possible to improve on the Chapman—Robbins—Kiefer bounds too in 
some cases. Take 


T(n/2)  o Hee 
T[(n+1)/2] /2 a 


to be an estimate of 0. Now E,T =o and 
2 n 
ag@ict (2a foe 
@ 2 \T[(n+1)/2} a 
-<{ T(n/2) y 
2 T[(n+1)/2] 


_ afaf Te) 
welt) =| 8 (tl) i. 


T(X X2, svete Xn) = 


so that 


For n = 2, 


which is > 0.269807, the CRK bound. Note that T is the UMVUE. 


Remark 11. Yn general the CRK inequality is as sharp as the FCR inequality. See Chapman 
and Robbins [12, pp. 584-585], for details. 


We next introduce the concept of efficiency. 
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Definition 2. Let T,,7, be two unbiased estimators for a parameter 6. Suppose that 
Eo i < 00, Eg i; < oo. We define the efficiency of 7; relative to T, by 


varg(T>) 
ff9(T, | 7.) = ——— 14 
effg (Ty | T2) scare (Ti) (14) 
and say that T; is more efficient than 7} if 
effg(T | T2) > 1. (15) 


It is usual to consider the performance of an unbiased estimator by comparing its 
variance with the lower bound given by the FCR inequality. 


Definition 3. Assume that the regularity conditions of the FCR inequality are satisfied by 
the family of DFs {F»,9 € O}, O C R. We say that an unbiased estimator T for parameter 
6 is most efficient for the family {Fg} if 


37-1 
vatg(T) = [e {ee | | =1/1,(6). (16) 


Definition 4. Let T be the most efficient estimator for the regular family of DFs {Fo, 
6 € ©}. Then the efficiency of any unbiased estimator 7, of 6 is defined as 


varg(T) 1 


efolt) Seter| 2) = varg(T})  I,(0)vare(T1) ma) 


Clearly, the efficiency of the most efficient estimator is 1, and the efficiency of any 
unbiased estimator T, is < 1. 


Definition 5. We say that an estimator T; is asymptotically (most) efficient if 
lim effg(T;) = 1 (18) 
noo 


and T; is at least asymptotically unbiased in the sense that lim,_,.. EgT, = 0. Here n is 
the sample size. 


Remark 12. Definition 3, although in common usage, has many drawbacks. We have 
already seen cases in which the regularity conditions are not satisfied and yet UMVUEs 
exist. The definition does not cover such cases. Moreover, in many cases where the regu- 
larity conditions are satisfied and UMVUEs exist, the UMVUE is not most efficient since 
the variance of the best estimator (the UMVUE) does not achieve the lower bound of the 
FCR inequality. 


Example 7. Let X ~ b(n,p). Then we have seen in Example | that X/n is the UMVUE 
since its variance achieves the lower bound of the FCR inequality. It follows that X/n is 
most efficient. 
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Example 8. Let X,,X>,...,X, be iid P(A) RVs and suppose w(A) = P,(X = 0) =e. 
From Example 2, the UMVUE of ~ is given by Ty = = ee with 

vary (To) = e~?(e*/" — 1). 
Also I, (A) = n/(Ae7?4). It follows that 


(Ae) /n re~?/n 
e2\(eX/n — 1) + e-2A(d/n) =1 


eff ) (To) = 


since e* — 1 > x for x > 0. Thus Tp is not most efficient. However, since eff (7) — 1 as 
n—>+ oo, To is asymptotically efficient. 


In view of Remarks 6 and 7, the following result describes the relationship between 
most efficient unbiased estimators and UMVUEs. 


Theorem 3. A necessary and sufficient condition for an unbiased estimator T of ~ to be 
most efficient is that T be sufficient and the relation (8) holds for some function k(@). 


Clearly, an estimator T satisfying the conditions of Theorem 3 will be the UMVUE, and 
two estimators coincide. We emphasize that we have assumed the regularity conditions of 
FCR inequality in making this statement. 

Example 9. Let (X,Y) be jointly distributed with PDF 
fols,y) =exp{— (7 +6y) }, x>0, y>0. 


For a sample (x,y) of size 1, we have 


0 Ox x 
a=] = ( L@ ) = Ly, 
0 og fo (x,y) aoa t pty 
Hence, information for this sample is 


E(X?) 2E(XY) 


2 o4 gz 
Now 

2 2 2 2 

E9(¥") = a, Eo(X~) = 20 
and E(XY)=1, 
so that 
2 2 Z 2 
1) =at @ @2. @2 


Therefore, amount of Fisher’s Information in a sample of n pairs is ae 
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We return to Example 8.3.23 where X,,X2,...,X, are iid G(1,0) and Y,,¥o,...,Yn 
are iid G(1,1/0), and X’s and Y’s are independent. Then (X1,Y,) has common PDF 
fo(x,y) given above. We will compute Fisher’s Information for @ in the family of PDFs 
of S(K, Y) = (92 X;/S- ¥;)'/?. Using the PDFs of 37 X; ~ G(n,0) and S> Y¥; ~ G(n, 1/0) 
and the transformation technique, it is easy to see that S(X, Y) has PDF 


_ (Qn) ifs , 0 —_ : 
805) = Trap ( *) , ood 


Thus 


An? joa n _ 2n 2n 
— 2(2n+1)f 62 \2n4+1 


That is, the information about 0 in S is smaller than that in the sample. 
The Fisher Information in the conditional PDF of S given A = a, where A(X, Y) = 
S\(X)S2(¥), can be shown (Problem 12) to equal 


2a Ki (2a) 
62 Ko(2a)’ 


where Ko and Kj are Bessel functions of order 0 and 1, respectively. Averaging over all val- 
ues of A, one can show that the information is 27/ 6? which is the total Fisher information 
in the sample of n pairs (x;,y;)’s. 


PROBLEMS 8.5 


1. Are the following families of distributions regular in the sense of Fréchet, Cramér, 
and Rao? If so, find the lower bound for the variance of an unbiased estimator based 
on a sample size n. 


(a) fo(x) = 0—'e-*/® if x > 0, and = 0 otherwise; 6 > 0. 
(b) fa(x) =e °-® if 6 <x < 00, and = 0 otherwise. 
(c) fo(x) = @(1—0)*, x =0,1,2,...,0<0<1. 
(d) f(x;02) = (1/oV2mJe* 2%", —00 < x < 00; 0? > 0. 
2. Find the CRK lower bound for the variance of an unbiased estimator of 0, based on 
a sample of size n from the PDF of Problem 1(b). 


3. Find the CRK bound for the variance of an unbiased estimator of @ in sampling from 
N(0, 1). 
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4. 


10. 
11. 


12. 


In Problem 1 check to see whether there exists a most efficient estimator in each 
case. 


. Let X,,X2,...,X, be a sample from a three-point distribution: 


7 1 0 
Fix == PiX=y}=5, PiX=y3}= 5, 


where 0 < 6 < 1. Does the FCR inequality apply in this case? If so, what is the lower 
bound for the variance of an unbiased estimator of 0? 


. Let X),Xo,...,X, be iid RVs with mean ju and finite variance. What is the efficiency 


of the unbiased (and consistent) estimator [2/n(n + 1)] 57i_, iX; relative to X? 


. When does the equality hold in the CRK inequality? 
. Let X,,X2,...,X;, be a sample from N(, 1), and let d(j:) = p?: 


(a) Show that the minimum variance of any estimator of 7 from the FCR inequality 
is 4p? /n: 

(b) Show that 7(X,,X2,...,X,) = an (1/n) is the UMVUE of 1? with variance 
(4y7/n+2/n’). 


. Let X,,X2,...,X, be iid G(1, 1/a) RVs: 


(a) Show that the estimator T(X),X2,...,X,) = (n—1)/nX is the UMVUE for a 
with variance a*/(n—2). 

(b) Show that the minimum variance from FCR inequality is a?/n. 

In Problem 8.4.16 compute the relative efficiency of jig with respect to /i. 


Let X1,Xo,...,X, and Y1,¥2,...,¥m be independent samples from N(j,07) and 
N(ju,03), respectively, where j,07,03 are unknown. Let p = 05/07 and @ = m/n, 
and consider the problem of unbiased estimation of ju: 

(a) If p is known, show that 


jo =0X+(1—-a)Y, 


where a = p/(p +0) is the BLUE of jz. Compute var(/io). 
(b) If p is unknown, the unbiased estimator 


_X+0Y 


is optimum in the neighborhood of p = 1. Find the variance of ju. 
(c) Compute the efficiency of ji relative to jig. 
(d) Another unbiased estimator of ju is 
pFX+6Y 
6+ pF 
where F = S3/S? is an F(m—1,n—1) RV. 
Show that the Fisher Information on @ based on the PDF 


Rea \*(9*5)| 


where Ko(2a) and K,(2a) are Bessel functions of order 


? 


= 


for fixed a equals i me 


0 and | respectively. 
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8.6 SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS) 


One of the simplest and oldest methods of estimation is the substitution principle: Let 
(0), 0 € © be a parametric function to be estimated on the basis of a random sample 
X,,X,...,X, from a population DF F. Suppose we can write w(6) = h(F) for some known 
function A. Then the substitution principle estimator of w(0) is h(F), where F* is the 
sample distribution function. Accordingly we estimate pz = p(F) by u(F*) = X, m = 
EpX* by ae X;/n, and so on. The method of moments is a special case when we need to 
estimate some known function of a finite number of unknown moments. Let us suppose 
that we are interested in estimating 


6 =h(m,,mp,...,m), (1) 


where fh is some known numerical function and mj; is the jth-order moment of the 
population distribution that is known to exist for 1 <j <k. 


Definition 1. The method of moments consists in estimating @ by the statistic 


n n n 
T(X1,...,Xn) =h (pp aD) (2) 
1 1 1 


To make sure that T is a statistic, we will assume that h : R; — R is a Borel-measurable 
function. 


Remark I. It is easy to extend the method to the estimation of joint moments. Thus we 
use n—' )~) X;Y; to estimate E(XY) and so on. 


Remark 2. From the WLLN, n7! S~"_, X! “, EX/. Thus, if one is interested in estimating 
the population moments, the method of moments leads to consistent and unbiased estima- 
tors. Moreover, the method of moments estimators in this case are asymptotically normally 
distributed (see Section 7.5). 

Again, if one estimates parameters of the type @ defined in (1) and h is a continuous 
function, the estimators T(X),X2,...,X,) defined in (2) are consistent for 0 (see Prob- 
lem 1). Under some mild conditions on h, the estimator T is also asymptotically normal 
(see Cramér [17, pp. 386—387]). 


Example 1. Let X,,X2,...,X, be iid RVs with common mean yu and variance o?. Then 
o = \/(m2 —m*), and the method of moments estimator for o is given by 


Although T is consistent and asymptotically normal for a, it is not unbiased. 

In particular, if X;,X2,...,X, are iid P(A) RVs, we know that EX; = A and var(X,) = A. 
The method of moments leads to using either X or >) (X;— X)*/n as an estimator of 2. 
To avoid this kind of ambiguity we take the estimator involving the lowest-order sample 
moment. 


SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS) 


Example 2. Let X|,X2,...,X;, be a sample from 


1 
<x<b 
fa)= 4 b=a’ “8 ""™ 
0, otherwise. 
Then 
b b-—ay 
Ex = and wee! —) 


387 


The method of moments leads to estimating EX by X and var(X) by )7)(X;— X)*/n so 


that the estimators for a and b, respectively, are 


= 35 ~"(X; — X)? 
Fi Misc eX dail ) 
n 
and 
a 3S (xX) — X)? 
Ty(Xiyx++5Xq) =A Milk ) 


Example 3. Let X,,X2,...,Xwy be iid b(n,p) RVs, where both n and p are unknown. The 


method of moments estimators of p and n are given by 
X = EX =np 


and 
i 
oe = EX? = np(1—p)+n’p’. 
n 


Solving for n and p, we get the estimator for p as 
T(X Xv) = x 
1 loess > AN ae eee ca 


where T>(X),...,Xj) is the estimator for n, given by 

_ (x)? 
a N 
¥+X°— (311'x?/N) 


T2(X1,Xo,...,Xn) 


Note that X 7 np, > X?/N Bae np(1—p)+n?p’, so that both 7; and T> are consistent 


estimators. 


Method of moments may lead to absurd estimators. The reader is asked to compute 
estimators of @ in N(0,0) or N(0,67) by the method of moments and verify this assertion. 
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PROBLEMS 8.6 


1. Let X, = a, and Y, a5 b, where a and b are constants. Let h: Ry + R be acontinuous 
function. Show that h(X,,, Yn) Z h(a, b). 

2. Let X,,X2,...,X, be a sample from G(a, 3). Find the method of moments estimator 
for (a, 8). 

3. Let X,,X2,...,X, bea sample from N(y, ge). Find the method of moments estimator 
for (11,07). 

4. Let X,,X>,...,X, be a sample from B(a, 3). Find the method of moments estimator 
for (a, 8). 


5. A random sample of size n is taken from the lognormal PDF 
1 
fj H,0) = (oV In) Ix texp | —ses(logs— nh, x>0. 
or 


Find the method of moments estimators for jz and o?. 


8.7 MAXIMUM LIKELIHOOD ESTIMATORS 


In this section we study a frequently used method of estimation, namely, the method of 
maximum likelihood estimation. Consider the following example. 


Example 1. Let X ~ b(n,p). One observation on X is available, and it is known that n is 
either 2 or 3 and p= 5 or i. Our objective is to estimate the pair (n,p). The following 
table gives the probability that X = x for each possible pair (n, p): 


x (2,4) (2,4) (3,4) (3,4) Maximum Probability 
a. 2 4 Tr 8 4 
4 9 8 27 9 
1 i 4 3 12 1 
2 9 8 27 3 
1 \ 3 6 3 
24 9 8 om] 8 
\ 1 1 
3 0 0) z aq z 


The last column gives the maximum probability in each row, that is, for each value that 
X assumes. If the value x = 1, say, is observed, it is more probable that it came from 
the distribution (2, 5) than from any of the other distributions and so on. The following 
estimator is, therefore, reasonable in that it maximizes the probability of the observed 
value: 


(2,4) ifx=0, 
ae (2,5) ifx=1, 
(apes irr 

(3,5) if x= 2, 

(3,3) ifx=3. 
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The principle of maximum likelihood essentially assumes that the sample is represen- 
tative of the population and chooses as the estimator that value of the parameter which 
maximizes the PDF (PMF) fo(x). 


Definition 1. Let (X),X2,...,X,) be a random vector with PDF (PMF) fo(x1,x2,..-,%n), 
6 € O. The function 


L(O; X1,%2,--+,Xn) = fo(X1,%2,--- sn); (1) 
considered as a function of 0, is called the likelihood function. 


Usually @ will be a multiple parameter. If X|,X2,...,X,, are iid with PDF (PMF) fo(x), 
the likelihood function is 


EO i ina %p)= | Foe) (2) 
i=1 


Let 0 CR and X = (X,,X>,...,Xn). 


Definition 2. The principle of maximum likelihood estimation consists of choosing as an 
estimator of 0 a 0(X) that maximizes L(@;x1,x2,...,X,), that is, to find a mapping @ of 
Ry +> RK, that satisfies 


L(O; x1,%2,+.+5%p,) = sup L(O; %1,X0,.-+5%n)- (3) 
0c@é 


(Constants are not admissible as estimators.) 


Ifad satisfying (3) exists, we call it a maximum likelihood estimator (MLE). 
It is convenient to work with the logarithm of the likelihood function. Since log is a 
monotone function, 


log L(0; X1,---,X) = suplog L(O; x1,...,Xn). (4) 
Te 


Let © be an open subset of R,, and suppose that fg(x) is a positive, differentiable 
function of @ (that is, the first-order partial derivatives exist in the components of @). Ifa 
supremum @ exists, it must satisfy the likelihood equations 


0 log L(0; x1,-..,%n) 
06; 


=0, j=1,2,...,k, @=(),...,9). (5) 


Any nontrivial root of the likelihood equations (5) is called an MLE in the loose sense. 
A parameter value that provides the absolute maximum of the likelihood function is called 
an MLE in the strict sense or, simply, an MLE. 


Remark 1. If © C ®, there may still be many problems. Often the likelihood equa- 
tion 0L/06 = 0 has more than one root, or the likelihood function is not differentiable 
everywhere in O, or 6 may be a terminal value. Sometimes the likelihood equation 
may be quite complicated and difficult to solve explicitly. In that case one may have to 
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resort to some numerical procedure to obtain the estimator. Similar remarks apply to the 
multiparameter case. 


Example 2. Let X,,X2,...,X, be a sample from N(j:,07), where both yz and o? are 
unknown. Here © = {(j1,07), 00 < ju < 00, 0” > O}. The likelihood function is 


n 


1 (x; — ps)? 
2, _ i 
LE 073 X15 2225Xn) = a" (ony os > 352 


and 
log L(u, 07; x) = —= logo” — 5 log(2n) — 


The likelihood equations are 


and 


Solving the first of these equations for jz, we get js = X and, substituting in the second, 
6? =", [(X; —X)*/n]. We see that (1,67) € © with probability 1. We show that (/i, 67) 
maximizes the likelihood function. First note that X maximizes L(j,07; x) whatever o7 
is, since L(jz,07; x) — 0 as |u| > 00, and in that case L(ji,07; x) 3 0 as 0? + 0 or 00 
whenever 0 € O, 0 = (ji, 67). 

Note that G? is not unbiased for 0”. Indeed, EG? = [(n— 1) /n|o?. But na /(n— 1) = S? 
is unbiased, as we already know. Also, i is unbiased, and both fi and G? are consistent. In 
addition, /i and G? are method of moments estimators for jz and o7, and (/i, 67) is jointly 
sufficient. 

Finally, note that ji is the MLE of ju if 0? is known; but if jz is known, the MLE of o” 
is not 6? but )>}(X;— 1)? /n. 


Example 3. Let X,,X2,...,X, be a sample from PMF 
—, k=1,2,...,N, 
0) otherwise. 


The likelihood function is 


an < Snayha) SN, 
L(N; ky, ko,.--,kn) =< N"’ 1 <max(k,...,kn) <N 


0, otherwise. 


Clearly the MLE of N is given by 


N(X1,X2, avers Xn) = max(X),X2, ae Xn) 
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if we take any @ < N as the MLE, then Pa(ki,ko,-.+,kn) = 0; and if we take any B>N 
as the MLE, then Pa(ki,ka,---skn) =1/(8)" < 1/(N)" = Px(ki, ko... kn). 
We see that the MLE N is consistent, sufficient, and complete, but not unbiased. 


Example 4. Consider the hypergeometric PMF 


elles 
Pr=y (A) 
0, otherwise. 


max(0,1—N+M) <x <min(n,M), 


To find the MLE N = N(X) of N consider the ratio 


Py(x) _N-n N-M 


R = : 
(N) Py—1(x) N N-—M-—n+x 


For values of N for which R(N) > 1, Py(x) increases with N, and for values of N for 
which R(N) < 1, Py(x) is a decreasing function of N: 


R(N)>1  ifandonlyif = N<— 
Xx 


and 

M 
R(N) <1 ifandonlyif = N>—. 
x 


It follows that Py(x) reaches its maximum value where N + nM_/x. Thus N(X) = [nM /X], 
where [x] denotes the largest integer < x. 


Example 5. Let X,,X2,...,X, be asample from U[6— 5 JO+ 5]. The likelihood function is 


1 if 0—+4 <min(x,..-,%n) 
L(0; x1 ,%2,---;Xn) = < max(x,...,%n) <O+5, 


0 otherwise. 


Thus L(6; x) attains its maximum provided that 
1 . 1 
a5 < min(x1,...,%n) and eg > max(x,..-,%n), 


or when 


1 
9 <min(x1,...,%n) +5 and 9 > max(x1,---,%n) — 5- 


It follows that every statistic T(X),X2,...,X,) such that 


dl 1 
ae ee < min X;+— 
max X; = T (Xt Royessjy Xn) = es 77 (6) 
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is an MLE of 0. Indeed, for 0 < a < 1, 
Ta (X Xn) = X, : (1+ min X, X;) 
alAl,--+,An me eace py Ieien ee i 


lies in interval (6), and hence for each a, 0 < a < 1, Ta(X1,...,X,) is an MLE of 0. In 
particular, if a = 5. 
minX; +maxX; 


Ty /2(X1,..+,Xn) = 5 


is an MLE of 6. 


Example 6. Let X ~ b(1,p), p € [}, 3]. In this case L(p; x) = p*(1—p)'~*, x =0, 1, and 
we cannot differentiate L(p; x) to get the MLE of p, since that would lead to p = x, a value 


that does not lie in © = [}, #]. We have 


which is maximized if we choose p(x) i if x = 0, and ; if x = 1. Thus the MLE of p 
is given by 


2X +1 
p(X) = -—.* 


Note that E,p(X) = (2p + 1)/4, so that p is biased. Also, the mean square error for p is 


% 1 1 
Ep(6(X) — p)? = 7g Ep(2X +1 — 4p)? = =e. 


In the sense of the MSE, the MLE is worse than the trivial estimator 6(X) = 5, for 
E,(3—p)” =(3—p)” S 76 for p € [7,4]. 


Example 7. Let X1,X2,...,Xn be iid b(1,p) RVs, and suppose that p © (0,1). If 
(0,0,...,0)((1,1,...,1)) is observed, X = 0(X = 1) is the MLE, which is not an admissible 
value of p. Hence an MLE does not exist. 


Example 8. (Oliver [78]). This example illustrates a distribution for which an MLE is 
necessarily an actual observation, but not necessarily any particular observation. Let 
X1,X2,...,X, be a sample from PDF 


2% O<x<8, 
aé 

fo(x) = eas 6<x<a, 
aa-—6é 


0, otherwise, 
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where @ > 0 is a (known) constant. The likelihood function is 


L(; 21,2, -+-5%n) = (5) WG) (<=) ; 


where we have assumed that observations are arranged in increasing order of magnitude, 
O< x, <x <-++ <x, <a. Clearly L is continuous in @ (even for 6 = some x;) and 
differentiable for values of 6 between any two x;’s. Thus, for xj < 6 < xj41, we have 


L(0) = (2) d“(a—9)- 0) i Il a—x;), 


i=1 i=j+l1 
A logL jo n-j PlogL  j n—j 
00 a a eae 


It follows that any stationary value that exists must be a minimum, so that there can be no 
maximum in any range x; < 0 < xj+,. Moreover, there can be no maximum in 0 < @ < x, 
Or X, <9 <a. This follows since, for 0 < 6 < x, 


100) = (2) (@-9-"TJla=si 
i=1 
is a strictly increasing function of 6. By symmetry, L(@) is a strictly decreasing function 
of @ in x, <0 <a. We conclude that an MLE has to be one of the observations. 

In particular, let a = 5 and n = 3, and suppose that the observations, arranged in 
increasing order of magnitude, are 1,2,4. In this case the MLE can be shown to be 6= 1, 
which corresponds to the first-order statistic. If the sample values are 2, 3,4, the third-order 
statistic is the MLE. 


Example 9. Let X,,X2,...,X, be a sample from G(r,1/3); 6 > 0 and r > 0 are both 
unknown. The likelihood function is 


Be 7 PZ : 
im1%; eXP(—BD 1%), 1129, 
LOO ihren Ky) =X LE Toy ll I ( 4 ) 


0, otherwise. 


Then 


log L(3,r) =nrlog 6 —nlogI(r) + (r—1) S “logx; - BY— x, 
i=1 i=1 


0 logL(8,r) = 5- bs 0, 


aB 
(r) n 
+ lo x; = 0. 
ro » ‘ 


0 logL(8,r) 
The first of the likelihood equations yields 3 (X1,X2,---,%) =7r/x, while the second gives 


Or 
ris rr) 
nlog 5 +) _logxi— "TG =0, 


a 
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that is, 


I’(r) 1g 
logr— T(r) eer 


which is to be solved for 7. In this case, the likelihood equation is not easily solvable and 
it is necessary to resort to numerical methods, using tables for I’(r)/T(r). 


Remark 2. We have seen that MLEs may not be unique, although frequently they are. 
Also, they are not necessarily unbiased even if a unique MLE exists. In terms of MSE, 
an MLE may be worthless. Moreover, MLEs may not even exist. We have also seen that 
MLEs are functions of sufficient statistics. This is a general result, which we now prove. 


Theorem 1. Let T be a sufficient statistic for the family of PDFs (PMFs) {fo : 6 € O}. If 
a unique MLE of @ exists, then it is a (nonconstant) function of T. If a MLE of 6 exists but 
is not unique, then one can find a MLE that is a function of T. 


Proof. Since T is sufficient, we can write 
L(@) = fo(x) = h(x)go(T(x)), 


for all x, all 6, and some / and gg. If a unique MLE 6 exists that maximizes L(0), it also 
maximizes gg (T(x)) and hence 6 is a function of T. If a MLE of 0 exists but is not unique, 
we choose a particular MLE @ from the set of all MLE’s which is a function of T. 


Example 10. Let X,,X2,...,X, be a random sample from U[6,6+ 1], 6 € R. Then the 
likelihood function is given by 


1 n 
L(0;x) = (5) lip 12a <agy oni] X)- 


We note that T(X) = (X(1),X(n)) is jointly sufficient for and any @ satisfying 
O-1< xq) S< Xm) <O+1, 
or, equivalently, 
Xin) —1 SO <x) +1 
maximizes the likelihood and hence is an MLE for @. Thus, for0 <a < 1, 


bq = a(X(n = +1 —a)(X(1) + 1) 


is an MLE of 0. If @ is a constant independent of the X’s, then 6 is a function of T. If, 
on the other hand, a depends on the X’s, then #, may not be a function of T alone. For 
example 


Bq. = (sin? X1)(X(n) — 1) + (cos? X1) (Xa) +1) 


is an MLE of @ but not a function of T alone. 
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Theorem 2. Suppose that the regularity conditions of the FCR inequality are satisfied and 
belongs to an open interval on the real line. If an estimator 6 of 6 attains the FCR lower 
bound for the variance, the likelihood equation has a unique solution 6 that maximizes the 
likelihood. 


Proof. If 6 attains the FCR lower bound, we have [see (8.5.8)] 


O log fo(X) 


se) = [k(0)]10(X) 6) 


with probability 1, and the likelihood equation has a unique solution 6 = 6. 
Let us write A(@) = [k(@)]~!. Then 


oe log fo(X) Al A) 
so that 
Plog fo(X)| 
Oe 0=0 aii 


We need only to show that A(@) > 0. 
Recall from (8.5.4) with 7)(0) = 6 that 
Bo { (T(x) 6] ERED | 1 


and substituting T(X) — 0 = k(6) eee) we get 
That is, 


and the proof is complete. 


Remark 3. In Theorem 2 we assumed the differentiability of A(@) and the existence of the 
second-order partial derivative 07 log fo /0 0°. If the conditions of Theorem 2 are satisfied, 
the most efficient estimator is necessarily the MLE. It does not follow, however, that every 
MLE is most efficient. For example, in sampling from a normal population, 6? = i (Xi- 
X)*/n is the MLE of o”, but it is not most efficient. Since )>(X; — X)*/o? is y(n — 1), 
we see that var(G”) = 2(n— 1)o*/n*, which is not equal to the FCR lower bound, 204 /n. 
Note that 6? is not even an unbiased estimator of o”. 


We next consider an important property of MLEs that is not shared by other methods 
of estimation. Often the parameter of interest is not @ but some function h(@). If 0 is MLE 
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of 0 what is the MLE of h(0)? If \ = A(@) is a one to one function of 0, then the inverse 
function h—!(X) = @ is well defined and we can write the likelihood function as a function 
of A. We have 


L*(A;x) = L(h!(A);x) 
so that 
sup L* (A;x) = supL(h7!(A);x) = supL(6;x). 
\ d 6 


It follows that the supremum of L* is achieved at \ = h(0). Thus h(0) is the MLE of h(@). 
In many applications \ = h(@) is not one-to-one. It is still tempting to take A = h(@) as 
the MLE of 4. The following result provides a justification. 


Theorem 3 (Zehna [122]). Let {fg : 6 € O} be a family of PDFs (PMFs), and let L(@) be 
the likelihood function. Suppose that O C Rj, k > 1. Leth: © —+ A be a mapping of O 
onto A, where A is an interval in ®,(1 < p < k). If @ is an MLE of 9, then h(@) is an MLE 
of h(@). 


Proof. For each 4 € A, let us define 
0, ={0:0€0,h(8@) =A} 
and 


M(A;x) = sup L(0;x). 
0EO,y 
Then M defined on A is called the likelihood function induced by A. if 0 is any MLE of 
0, then @ belongs to one and only one set, O;. Since @ € Ox, \= h(8). Now 


M(A; x) = sup L(; x) > L(6; x) 
GEO) 


and \ maximizes M, since 


M(A; x) < supM(A; x) = sup L(8; x) =L(6;x), 
AEA 0EO) 


so that M(\; x) = supe, M(A; x). It follows that \ is an MLE of (0), where \ = h(8). 


Example 11. Let X ~ b(1,p), 0 < p < 1, and let A(p) = var(X) = p(1 —p). We wish to 
find the MLE of h(p). Note that A = (0, }]. The function h is not one-to-one. The MLE of 
p based on a sample of size n is p(X1,...,Xn) =X. Hence the MLE of parameter /(p) is 
h(X) = X(1—X). 


Example 12. Consider a random sample from G(1, 3). It is required to find the MLE of 8 
in the following manner. A sample of size n is taken, and it is known only thatk, O<k <n, 
of these observations are < M, where M is a fixed positive number. 
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Let p = P{X; < M} = 1—e-™/®, so that —M/8 = log(1 —p) and 8 = M/log[1/ 
(1 —p)]. Therefore, the MLE of 6 is M/log[1/(1 —p)], where p is the MLE of p. To 
compute the MLE of p we have 


L(p; X1,%2,--- 7) =p*(1 —p)"*, 
so that the MLE of p is p = k/n. Thus the MLE of 3 is 


ee 
log|n/(n—k)] 


Finally we consider some important large-sample properties of MLE’s. In the following 
we assume that {fo, 9 € O} is a family of PDFs (PMFs), where 9 is an open interval on &. 
The conditions listed below are stated when fg is a PDF. Modifications for the case where 
Jo is a PMF are obvious and will be left to the reader. 


B= 


(i) A log fo/00, 07 log fo/007, 0 log fo /O0> exist for all 0 € © and every x. Also, 


“ Ofo(x) O log fo(X) 


ee eT 


=0 for all 9 € O. 


—Co 


dx =0 for all 9 € O. 


fore) 062 
2 
(iii) —00 < f% MreB Lol) 5 (x)dx<0 forall d. 


(iv) There exists a function H(x) such that for all 0 € O 


| J log fo(x) 


(iy fro, PA) 


Ow” 


< H(x) and [- H(x)fo(x) dx = M(0) < co. 


(v) There exists a function g(@) which is positive and twice differentiable for every 
6 € O, and a function H(x) such that for all 0 


= | ( ret | | <H(x) and [_ Henals)ax <oo. 


Note that the condition (v) is equivalent to condition (iv) with the added qualification 
that (0) = 1. 
We state the following results without proof. 


Theorem 4 (Cramér [17]). 


(a) Conditions (i), (iii), and (iv) imply that, with probability approaching 1, as n + 00, 
the likelihood equation has a consistent solution. 


(b) Conditions (i) through (iv) imply that a consistent solution 6, of the likelihood 
equation is asymptotically normal, that is, 


o7!/n(6, —0) 4 Z, 
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where Z is N(0, 1) and 


On occasions one encounters examples where the conditions of Theorem 4 are not 
satisfied and yet a solution of the likelihood equation is consistent and asymptotically 
normal. 


Example 13 (Kulldorf [57]). Let X ~ N(0,0), 0 > 0. Let X1,X2,..-,Xn be n indepen- 


dent observations on X. The solution of the likelihood equation is 6, = )>;_, X?/n. Also, 


EX = 0, var(X*) = 20", and 
Ologfo(X)\* 1 
Eo ; = ‘ 
00 


We note that 


0, —> 0 
and 
J/n(6,—0) = yrs ~, N (0,267). 
However, 


A log fo 1 3x? 
Ey mt 7 + OO as 6 > 0 


and is not bounded in 0 < @ < oo. Thus condition (iv) does not hold. 


The following theorem covers such cases also. 
Theorem 5 (Kulldorf [57]). 


(a) Conditions (i), (iii), and (v) imply that, with probability approaching | as n — 00, 
the likelihood equation has a solution. 

(b) Conditions (i), (11), (iii), and (v) imply that a consistent solution of the likelihood 
equation is asymptotically normal. 


Proof of Theorems 4 and 5. For proofs we refer to Cramér [17, p. 500], and Kulldorf [57]. 


Remark 4, It is important to note that the results in Theorems 4 and 5 establish the con- 
sistency of some root of the likelihood equation but not necessarily that of the MLE when 
the likelihood equation has several roots. Huzurbazar [47] has shown that under certain 
conditions the likelihood equation has at most one consistent solution and that the like- 
lihood function has a relative maximum for such a solution. Since there may be several 
solutions for which the likelihood function has relative maxima, Cramér’s and Huzur- 
bazar’s results still do not imply that a solution of the likelihood equation that makes the 
likelihood function an absolute maximum is necessarily consistent. 
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Wald [115] has shown that under certain conditions the MLE is strongly consistent. It 
is important to note that Wald does not make any differentiability assumptions. 

In any event, if the MLE is a unique solution of the likelihood equation, we can use 
Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. Note that 
the asymptotic variance is the same as the lower bound of the FCR inequality. 


Example 14. Consider X),X,...,X, iid P(A) RVs, X € O = (0,00). The likelihood equa- 
tion has a unique solution, Max, 2325 h4) = X, which maximizes the likelihood function. We 
leave the reader to check that the conditions of Theorem 4 hold and that MLE X is consis- 
tent and asymptotically normal with mean A and variance /n, a result that is immediate 
otherwise. 


We leave the reader to check that in Example 13 conditions of Theorem 5 are satisfied. 


Remark 5. The invariance and the large sample properties of MLEs permit us to find 
MLEs of parametric functions and their limiting distributions. The delta method intro- 
duced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose in 
Example 13 we wish to estimate (0) = 67. By invariance of MLEs, the MLE of 7(@) 
is 1(0,) where 6, = >-1 X?/nis the MLE of 0. Applying Theorem 7.5.1 we see that W(O,) 
is AN(07,80+/n). 

In Example 14, suppose we wish to estimate ~(A) = 0) 


ii P\(X= —A, Then w(A \) = 
e—* is the MLE of w(A) and, in view of Theorem 7.5.1, w(A) ~ 


AN ann) 


Remark 6. Neither Theorem 4 nor Theorem 5 guarantee asymptotic normality for a unique 
MLE. Consider, for example, a random sample from U(0, 6]. Then X(,) is the unique MLE 


for 6 and in Problem 8.2.5 we asked the reader to show that n(0 — X(,,)) 4 G(1,6). 


PROBLEMS 8.7 


1. Let X1,X2,...,X, be iid RVs with common PMF (pdf) fg(x). Find an MLE for 0 in 
each of the following cases: 


(a) fo(x) = 5e7 9, 00 <x < co. 
(b) fo@) =e"? 0<x <0, 
(c) fa(x) = (Oa)x*-!e—®", x > 0, and a known. 
(d) fo(x) =0(1—x)9 1 ,0<x<1,0>1. 
2. Find an MLE, if it exists, in each of the following cases: 
(a) X ~ b(n, 6): both n and 6 € [0, 1] are unknown, and one observation is available. 
(b) X1,Xo,...,Xn ~ D(1,0), 0 € [5, 3]. 
(c) X1,X,..-,Xn~ N(O,07), OE R. 
(d) X),X2,...,X, is a sample from 


1-80 1 0 
k= i= PiX=y2}= 5, P{X = y}= 50 <6<1). 
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10. 


11. 


12. 
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(e) X1,Xo,...,X, ~ N(0,0),0< 0 <0. 
(f) X ~ C(6,0). 


. Suppose that 1 observations are taken on an RV X with distribution N(,u, 1), 


but instead of recording all the observations one notes only whether or not the 
observation is less than 0. If {X < 0} occurs m(< n) times, find the MLE of yu. 


. Let X,,Xz,...,X, be arandom sample from PDF 


fsa, 8) = Bote FB -), a<x<n, -co<a<o, 6>0. 


(a) Find the MLE of (a, 3). 
(b) Find the MLE of Pa.{X1 > 1}. 


. Let X;,X2,...,X, be a sample from exponential density fg(x) = 6e—, x > 0, 0 > 0. 


Find the MLE of 6, and show that it is consistent and asymptotically normal. 


. For Problem 8.6.5 find the MLE for (1,07). 
. For a sample of size 1 taken from N(ju,07), show that no MLE of (1,07) exists. 
. For Problem 8.6.5 suppose that we wish to estimate N on the basis of observations 


X1,X,...,Xu: 

(a) Find the UMVUE of N. 

(b) Find the MLE of N. 

(c) Compare the MSEs of the UMVUE and the MLE. 


. Let X(1 =1,2,...,5;7=1,2,...,n) be independent RVs where Xj ~ N(;,07), i= 


1,2,...,8. Find MLEs for ju, /2,..., /4s, and 0”. Show that the MLE for o? is not 
consistent as s —> oo (n fixed) (Neyman and Scott [77]). 


Let (X, Y) have a bivariate normal distribution with parameters /11, f2, ot; a, and p. 
Suppose that n observations are made on the pair (X,Y), and N — n observations on 
X that is, N —n observations on ¥ are missing. Find the MLE’s of ju1, 12,07,03, and 
p (Anderson [2]). 


[Hint: If f (x,y; 141, 12,07,05, P) is the joint PDF of (X,Y) write 
F(%,95 Has Has 07,0350) =i 0G 11,07 fyx(y | Bx,035(1— p”)), 


where f| is the marginal (normal) PDF of X, and fy|x is the conditional (normal) PDF 
of Y, given x with mean 


02 ia. 
py = (us - 92) +p—x 
O01 O71 


and variance o3(1—p*). Maximize the likelihood function first with respect to p14 
and 07 and then with respect to fu. — p(o2/01) M1, por/o1, and o3(1 — p*).] 

In Problem 5, let 9 denote the MLE of 0. Find the MLE of pp = EX, = 1/6 and its 
asymptotic distribution. 

In Problem 1(d), find the asymptotic distribution of the MLE of 0. 
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13. In Problem 2(a), find MLE of d(0) = 0” and its asymptotic distribution. 

14. Let X),X>z,...,X, be a random sample from some DF F on the real line. Suppose 
we observe x),%2,...,%, which are all different. Show that the MLE of F is F*, the 
empirical DF of the sample. 

15. Let X1,X2,...,X, be iid N(, 1). Suppose 0 = {ys > 0}. Find the MLE of w. 


16. Let (X),Xo,...,X,_1) have a multinomial distribution with parameters n,p),..., 
Pr-1, 9 < pi,p2,---,;Pr-1 <1, aes <1, where n is known. Find the MLE of 
(Pip, <0 De) 


17. Consider the one parameter exponential density introduced in Section 5.5 in its 
natural form with PDF 


fo(x) = exp{nT (x) + D(n) + S(x)}- 
(a) Show that the MGF of 7(X) is given by 
M(t) = exp{D(7) — D(n +1)} 


for t in some neighborhood of the origin. Moreover, E,,T(X) = —D'(n) and 
var(T(X)) = —Di(n). 
(b) If the equation E,,T(X) = T(x) has a solution, it must be the unique MLE of 7. 
18. In Problem 1(b) show that the unique MLE of @ is consistent. Is it asymptotically 
normal? 


8.8. BAYES AND MINIMAX ESTIMATION 


In this section we consider the problem of point estimation in a decision-theoretic setting. 
We will consider here Bayes and minimax estimation. 

Let {fo: 8 € O} be a family of PDFs (PMFs) and X,X2,...,X, be a sample from this 
distribution. Once the sample point (x1,x2,...,%,) is observed, the statistician takes an 
action on the basis of these data. Let us denote by A the set of all actions or decisions 
open to the statistician. 


Definition 1. A decision function 6 is a statistic that takes values in A, that is, 6 is a 
Borel-measurable function that maps ®,, into A. 


If X = x is observed, the statistician takes action d(X) € A. 


Example 1. Let A = {a),az}. Then any decision function 6 partitions the space of values 
of (X1,...,X,), namely, R,, into a set C and its complement C‘, such that if x € C we 
take action a, and if x € C® action a is taken. This is the problem of testing hypotheses, 
which we will discuss in Chapter 9. 


Example 2. Let A = O. In this case we face the problem of estimation. 
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Another element of decision theory is the specification of a loss function, which 
measures the loss incurred when we take a decision. 


Definition 2. Let A be an arbitrary space of actions. A nonnegative function L that maps 
© x A into & is called a loss function. 


The value L(6,a) is the loss to the statistician if he takes action a when @ is the true 
parameter value. If we use the decision function 6(X) and loss function L and @ is the true 
parameter value, then the loss is the RV L(0,d(X)). (As always, we will assume that L is 
a Borel-measurable function.) 


Definition 3. Let D be a class of decision functions that map ®, into A, and let L be a 
loss function on © x A. The function R defined on O x D by 


R(0,6) = EoL(0,6(X)) (1) 
is known as the risk function associated with 6 at 0. 
Example 3. Let A= 0 C 8, L(0,a) =|6—a|?. Then 
R(0,5) = EoL(0,5(X)) = Eo{5(X) — OY, 


which is just the MSE. If we restrict attention to estimators that are unbiased, the risk is 
just the variance of the estimator. 


The basic problem of decision theory is the following: Given a space of actions A, and a 
loss function L(6, a), find a decision function 6 in D such that the risk R(0, 5) is “minimum” 
in some sense for all 9 € ©. We need first to specify some criterion for comparing the 
decision functions 6. 

Definition 4. The principle of minimax is to choose 6* € D so that 
maxR(G, 5") < maxR(4, 6) (2) 


for all 6 in D. Such a rule 6*, if it exists, is called a minimax (decision) rule. 


If the problem is one of estimation, that is, if A = 0, we call 6* satisfying (2) a minimax 
estimator of 0. 


Example 4. Let X ~ b(1,p), p € © = {4,5}, and A = {aj, ap}. Let the loss function be 
defined as follows. 


ap 
pPi=t 4 
p2=4 2 
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The set of decision rules includes four functions: 01,62, 63,64, defined by 6, (0) = 0,(1) = 
ay; 62(0) = ay, 52(1) = ay; 63(0) = ay, 63(1) = ay; and 54(0) = 54(1) = ap. The risk 
function takes the following values 


I R(p1, 5;) R(p2, i) Max R(p, Oi) Min Max R(p, 0i) 


P1,P2 i pi.p2 


1 1 3 3 
7 5 5 B} 
2 Z z z z 
13 5 13 
3007 3 rs 
4. 4 2 4 


Thus the minimax solution is 62(x) = a, if x = 0 and = ay ifx = 1. 


The computation of minimax estimators is facilitated by the use of the Bayes estimation 
method. So far, we have considered 0 as a fixed constant and fg (x) has represented the PDF 
(PMF) of the RV X. In Bayesian estimation we treat @ as a random variable distributed 
according to PDF (PMF) (6) on ©. Also, 7 is called the a priori distribution. Now f (x | 0) 
represents the conditional probability density (or mass) function of RV X, given that 9 € O 
is held fixed. Since 7 is the distribution of 6, it follows that the joint density (PMF) of 
and X is given by 


F(%,0) = w(O)F(x | 4). (3) 


In this framework R(0,5) is the conditional average loss, E{L(6,6(X)) | 0}, given that 0 
is held fixed. (Note that we are using the same symbol to denote the RV @ and a value 
assumed by it.) 


Definition 5. The Bayes risk of a decision function 6 is defined by 
R(a,0) = E,R(0,6). (4) 


If is a continuous RV and X is of the continuous type, then 


R(x,8) = | R(0,5)n(0)d0 


Se L(0,5(x))f (x | 0)(8) dx dd 
= | ( L(0,5(x))f(x,0)dxdd. (5) 


If @ is discrete with PMF 7 and X is of the discrete type, then 
md= >) _LO,6(«) fF ,6). (6) 
6 x 


Similar expressions may be written in the other two cases. 
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Definition 6. A decision function 6* is known as a Bayes rule (procedure) if it minimizes 
the Bayes risk, that is, if 


R(x, 0*) = inf R(7r, 9). (7) 


Definition 7. The conditional distribution of RV 0, given X = x, is called the a posteriori 
probability distribution of 6, given the sample. 
Let the joint PDF (PMP) be expressed in the form 


F(x,) = g(x)h(8 | x), (8) 


where g denotes the joint marginal density (PMF) of X. The a priori PDF (PMF) 7(@) 
gives the distribution of @ before the sample is taken, and the a posteriori PDF (PMF) 
h(0 | x) gives the distribution of @ after sampling. In terms of h(@ | x) we may write 


R(n.6)= f alo9 { [ 110, 500)0(0 | x)a0} ax 6) 


or 


R(r,5) = > g(x) {SHesooyn | oh (10) 
0 


x 


depending on whether f and z are both continuous or both discrete. Similar expressions 
may be written if only one of f and 7 is discrete. 


Theorem 1. Consider the problem of estimation of a parameter 6 € O C & with respect 
to the quadratic loss function L(0,6) = (9 — 6). A Bayes solution is given by 


d(x) = E{0|X =x} (11) 
(6(x) defined by (11) is called the Bayes estimator). 


Proof. nthe continuous case, if 7 is the prior PDF of 0, then 
R(m,6) = [9 { [t0- 6(x)]? A(A | xo} dx, 


where g is the marginal PDF of X, and / is the conditional PDF of 6, given x. The 
Bayes rule is a function 6 that minimizes R(7,6). Minimization of R(7,5) is the same 
as minimization of 


[le seorn@ |x a0, 
which is minimum if and only if 
6(x) = E{6| x}. 


The proof for the remaining cases is similar. 
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Remark I. The argument used in Theorem | shows that a Bayes estimator is one which 


minimizes E{L(0,6(X)) |X}. Theorem | is a special case which says that if L(0,6(X)) = 
[9 — 5(X)}* the function 


6(x) = iG | x)d0 
is the Bayes estimator for @ with respect to 7, the a priori distribution on 0. 
Remark 2. Suppose T(X) is sufficient for the parameter 0. Then it is easily seen that the 
posterior distribution of @ given x depends on x only through 7 and it follows that the 


Bayes estimator of @ is a function of T. 


Example 5. Let X ~ b(n,p) and L(p,6(x)) = [p — 6(x)|*. Let t(p) = 1 for 0 <p < 1 be 
the a priori PDF of p. Then 


()p"(—p)"™* 


h x)= 
or Jo )p*—p)"*dp 


It follows that 


ep |x}= | php | x}dp 


+ 1 
~ n+2° 
Hence the Bayes estimator is 
X+1 
6*(X) = : 
Os aa 


The Bayes risk is 


r} dp 


_ ra [np(1 —p) + (1 —2p)?Jdp 


1 
6(n+2)° 


Example 6. Let X ~ N(, 1), and let the a priori PDF of jz be N(0, 1). Also, let L(u,5) = 
[uw — 6(X)]°. Then 
fom) _ mfx |e) 


A(u| x)= 2(x) = (x) ? 
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where 


f(x) = / flawau 
7 1 I, 
~ On) erD?2 exp 2H 
[oo( #2 (ant) 


a 
— (ntl? 1 5 we 
~(am)n72 O*P aaa eray 


It follows that 


h(u| x) = 


1 n+l ( nx ) 
——— exp ; 
J2n/(n +1) 2 MP a+ 
and the Bayes estimator is 


me Sak 


5*(x) = E(u |x} = = 


The Bayes risk is 


The quadratic loss function used in Theorem | is but one example of a loss function in 
frequent use. Some of many other loss functions that may be used are 


19 —6(X)P? ; |9—5(X)|\ 
6—d(X —_____—. d—6 d —__ . 
| 6( MI \0| ? | | ? ani \O, +1 


Example 7. Let X,,X2,...,X, be iid N(ju,07) RVs. It is required to find a Bayes estimator 
of 2 of the form 6(x1,...,Xn) = 5(X), where ¥ = )~} x;/n, using the loss function L(1,5) = 
|,1 — 0(x)|. From the argument used in the proof of Theorem | (or by Remark 1), the Bayes 
estimator is one that minimizes the integral [ | — 6(X)|h(,u|x) dy. This will be the case if 
we choose 6 to be the median of the conditional distribution (see Problem 3.2.5). 

Let the a priori distribution of js be N(6,77). Since X~N(p,07/n), we have 


(u—0)  n@—p)? } 


272 207 
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Writing 


(@—p)? = (@-0+0—p)? = 0)? — 2 8)(u— 8) + (uw 8)", 


we see that the exponent in f(x, j1) is 


14 o(Z | ") 2n(—O)(u=0) | og oh 


To? o o 


It follows that the joint PDF of js and X is bivariate normal with means 0,0, variances 7, 


T+ (o7/n), and correlation coefficient rT /,/[7? + (o?/n)]. The marginal of X is N(@,77 + 
(a? /n)), and the conditional distribution of jz, given X, is normal with mean 


T T £ _ 0(07 /n) +xr? 
or J? + (a?/n) \/t? + (0? /n) a 7? + (a?/n) 


and variance 


2 7? ms Ta7/n 
f =aTn| 7+ (62 /n) 


(see the proof of Theorem 1). The Bayes estimator is therefore the median of this 
conditional distribution, and since the distribution is symmetric about the mean, 


enn  O(07 /n) +x 
a Gay 


is the Bayes estimator of y. 


Clearly 5* is also the Bayes estimator under the quadratic loss function L(ju,5) = 
[uw —9(X)). 


Key to the derivation of Bayes estimator is the posteriori distribution, h(@ | x). The 
derivation of the posteriori distribution 4(6 | x), however, is a three-step process: 


1. Find the joint distribution of X and 0 given by 7(6)f(x | 6). 


2. Find the marginal distribution with PDF (PMF) g(x) by integrating (summing) over 
dE. 


3. Divide the joint PDF (PMF) by g(x). 


It is not always easy to go through these steps in practice. It may not be possible to 
obtain h(@ | x) in a closed form. 


Example 8. Let X ~ N(, 1) and the prior PDF of z be given by 


e7 (u-9) 


™(H) = ey 
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where @ is a location parameter. Then the joint PDF of X and p is given by 


1 2 e7 (H-9) 
= (x—p)° /2 
f(x, ph) = ae [ipo uO ei Op 


so that the marginal PDF of X is 


eB)? /29-H 


d 
a(x) =F f [pena oF 


A closed form for g is not known. 


To avoid problem of integration such as that in Example 8, statisticians use the so-called 
conjugate prior distributions. Often there is a natural parameter family of distributions 
such that the posterior distributions also belong to the same family. These priors make the 
computations much easier. 


Definition 8. Let X ~ f(x|) and 7(6) be the prior distribution on O. Then 7 is said to be 
a conjugate prior family if the corresponding posterior distribution (6 | x) also belongs 
to the same family as (0). 


Example 9. Consider Example 6 where (1) is N(0,1) and A(y | x) is N (4 4) so 


that both / and 7 belong to the same family. Hence N(0, 1) is a conjugate prior for ju. 


Example 10. Let X ~ b(n,p),0 <p < 1, and x(p) be the beta PDF with parameters (a, (3). 
Then 


pre iap ee 


h x)= = 
as Jy Pte "(1 = p)8"dp B(x + a, 8) 


which is also a beta density. Thus the family of beta distributions is a conjugate family of 
priors for p. 


Conjugate priors are popular because whenever the prior family is parametric the pos- 
terior distributions are always computable, (@|x) being an updated parametric version of 
(0). One no longer needs to go through a computation of g, the marginal PDF (PMF) of 
X. Once h(6|x) is known g, if needed, is easily determined from 


r(O)F (x10) 


8(*) =~ 6ix) 


Thus in Example 10, we see easily that g(x) is beta (x+ a,3), while in Example 6 g is 
given by 


1 wx 
g(x) = (n+1)!/2(27) nj © <5 ye a mt 
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Conjugate priors are usually associated with a wide class of sampling distributions, 
namely, the exponential family of distributions. 


Natural Conjugate Priors 


Sampling Prior Posterior 
PDF(PMP), f(x|9) (8) h(8|x) 

N(6,07) N(w,7?) N( SHES, 2) 
G(v, 8) G(a,8) Glat+v,8+x) 
b(n,p) B(a,B) = Bla+x,B+n—x) 
P(A) G(a,8) Glat+x,6+1) 
NB(r;p) B(a,B) = =Bla+r,64+x) 


G(7, 1/8) G(a,8) Glat+v,8 +x) 


Another easy way is to use a noninformative prior 7(@) though one needs some 
integration to obtain g(x). 


Definition 9. A PDF 7(@) is said to be a noninformative prior if it contains no information 
about 0, that is, the distribution does not favor any value of 6 over others. 


Example 11. Some simple examples of noninformative priors are 7(0) = 1, 7(@) = 4 and 
m(0) = ,/I(@). These may quite often lead to infinite mass and the PDF may be improper 
(that is, does not integrate to 1). 


Calculation of 4(@|x) becomes easier by-passing the calculation of g(x) when f(x|@) is 
invariant under a group G of transformations following Fraser’s [33] structural theory. 

Let G be a group of Borel-measurable functions on ®,, onto itself. The group operation 
is composition, that is, if g; and gy are mappings from RX, onto R,, gg) is defined by 
8281(X) = go(gi(x)). Also, J is closed under composition and inverse, so that all maps in 
G are one-to-one. We define the group G of affine linear transformations g = {a,b} by 


gx=atbx,aER, b>0. 


The inverse of {a,b} is 


and the composition {a,b} and {c,d} € G is given by 


{a,b} {c,d} (x) = {a,b}(c+dx) =a+b(c+dx) 
= (a+bc) + bdx = {a+ bc, bd}(x). 


In particular, 


{a,b}{a,b}~! = {a,b} {-§ 3} = {0,1} =e. 
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Example 12. Let X ~ N(,u,1) and let G be the group of translations § = {{b, 1}, —oo < 
b < co}. Let X,...,X, be a sample from N(y, 1). Then, we may write 


Xi = {H, 1}Zi, i=1,...,n, 


where Z\,...,Z, are iid N(0, 1). 
It is clear that Z ~ N(0,1/n) with PDF 


a a. 
and there is a one-to-one correspondence between values of {Z, 1} and {1,1} given by 
{X51} = {H, IHZ = {ue +z, 1. 


Thus x = +z with inverse map z = x — ju. We fix x and consider the variation in Z as a 
function of j:. Changing the PDF element of Z to js we get 


x xf 5(u x} 


as the posterior of jy given X with prior (sj) = 1. 


Example 13. Let X ~ N(0,07) and consider the scale group § = {{0,c}, c > 0}. Let X1, 
Xo,...,X, be iid N(0,02). Write 


X; = {0,0}Z,, b= 1,250 205n; 


where Z; are iid N(0, 1) RVs. Then the RV nS? = S>y_, Z? ~ x?(n) with PDF 


il ns? 2-4 
2T(%) ex 5 \ sey : 


The values of {0,5,} are in one-to-one correspondence with those of {0,o} through 


{0, sx} = {0, o}{0, sz}, 


where nS? = )>/_, X?, so that s, = os;. Considering the variation in s, as a function of o 
for fixed s, we see that ds, = sy. Changing the PDF element of s, to o we get the PDF 
of o as 


n 


1 ns? ns? z=} 
” ; 
ane) PY 262 f oe? 


which is the same as the posterior of o given s, with prior 7(a) = I/c. 


Example 14. Let X;...X;, be asample from N (1,07) and consider the affine linear group 
G = {{a,b}, —00 < a<oo,b > 0}. Then 


X;={u,o}Z;, i=1,...,n 
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where Z;’s are iid N(0, 1). We know that the joint distribution of (Z,S7) is given by 
=2 1 ( 1)s2 1 
n nz n—1)s: 
dz rg 
xo =} * ei ( 2 ) 
om) 
(n—1)s? (n—1)s? 
aod =}. 
exp { 3 5 


Further,the values of {Z, s,} are in one-to-one correspondence with the values of {1,7} 
through 


{x, 8x} _ {Hs o}1z, Sz} = {u+ 902, os} 


xX—p Sy 
and s,=—. 


>Z= 


Consider the variation of (Z,s,) as a function of (t,o) for fixed (X, s,). The Jacobian of 
the transformation from {Z, s,} to {4,7} is given by 


Varo“ | = (espeye 
conf 8) ER) (E) 


This is the PDF one obtains if 7(j:) = 1 and (co) = + and y and o are independent RV. 


oO 


The following theorem provides a method for determining minimax estimators. 


Theorem 2. Let {f/): 0 € O} be a family of PDFs (PMFs), and suppose that an estimator 
6* of 6 is a Bayes estimator corresponding to an a priori distribution 7 on O. If the risk 
function R(0,6*) is constant on O, then 5* is a minimax estimator for 0. 


Proof. Since 6* is the Bayes estimator of 0 with constant risk r* (free of #), we have 


r =R(n,0*) = ia R(0,5°)m(0)d0 


= int, [ R(0,5)n(0)a0 


< sup inf R(0,5) < inf supR(6,6). 
= es (0,6) 
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Similarly, since r* = R(0,6*) for all 8 € O, we have 


r* = supR(0,6*) > inf sup R(O,6). 
9E0 s€D co 


Together we then have 


sup R(6,6") = inf sup R(0,6), 
dco 0cO 


which means 6* is minimax. 


The following examples show how to obtain constant risk estimators and the suitable 
prior distribution. 


Example 15. (Hodges and Lehmann [43]). Let X ~ b(n,p), 0 < p < 1. We seek a minimax 
estimator of p of the form aX + , using the squared error loss function. We have 


R(p,6) = Ep{aX + B—p}? = E,{a(X —np) +8 +(an—1)p}* 
= [(an— 1)? — a’ nlp’ + [a?n+28(an—1)|p + 6”, 


which is a quadratic equation in p. To find a and 6 such that R(p,d) is constant for all 
p € ©, we set the coefficients of p* and p equal to 0 to get 


(an—1)?—a’n=0 and a’n+2B(an—1)=0. 


It follows that 


and 
2(1+ Vn) 2(/n— 1) 


Since 0 < p < 1, we discard the second set of roots for both a and @, and then the estimator 
is of the form 


B= 


X ; 1 
Va(l+ Vn) © 20.+yn) 


It remains to show that 6* is Bayes against some a priori PDF z. 
Consider the natural conjugate priori PDF 


m(p) = [B(0', 8’) 'p® "1—pyP', OS p<, af’ >0. 


5*(x) = 


The a posteriori PDF of p, given x, is expressed by 


prey —py et 
B(x +a’,n—x+ 8’) 


h(p | x) = 
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It follows that 


eee B(xta’+1,n—x4+ 8’) 
Bix+a’,n—x+ 8’) 
_ xta’ 
~ ntal+pr 


which is the Bayes estimator for a squared error loss. For this to be of the form 6*, we 
must have 
1 1 1 al 
= and = 


Vall+ Jn) nta'+s’ 2(1+Vn) nta’+p”’ 


giving a’ = 8’ = \/n/2. It follows that the estimator 6* (x) is minimax with constant risk 


1 
R(p,6*) = ———= for all 0,1). 
Note that the UMVUE (which is also the MLE) is 6(X) = X/n with risk R(p,d) = 
p(1—p)/n. Comparing the two risks (Figs. 1 and 2), we see that 


pip) <1 1, VIt2Va 


if and only if > 
n <altyae Fandonlyif  P— 512 om’ 


so that 


R(p, 6") < R(p,6) 


0.25 + 


1/16 \ R(p, 6*) 


0.5 1 P 


Fig. 1 Comparison of R(p,6) and R(p,6*),n = 1. 
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R(p, 6) 
1/64 R(p, 6*) 


0 0.5 1 p 
Fig. 2 Comparison of R(p,6) and R(p,6*),n = 9. 


in the interval (4 —An, 5 + dy), where a, — 0 as n > co. Moreover, 


sup, R(p, 6) 1/4n _n+2/n+l 


= : | 
sup, R(p,0*) — 1/[4(1 + ¥n)?] 3 > as n > co 


Clearly, we would prefer the minimax estimator if 1 is small and would prefer the UMVUE 
because of its simplicity if n is large. 


Example 16. (Hodges and Lehmann [43]). A lot contains N elements, of which D are 
defective. A random sample of size n produces X defectives. We wish to estimate D. 


Clearly, 
mena (NCCENCY 


D 
EpX = nw and = 


nD(N —n)(N—D) 
N2(N—1) 


Proceeding as in Example 8, we find a linear function of X with constant risk. Indeed, 
Ep(aX + 8 —D)* = 6? when 
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We show that aX + is the Bayes estimator corresponding to a priori PMF 
'(N 
P{D=d} = | (“)e"a — pyX~“p"(1— py?" dp, 


where a,b > 0, and c =T'(a+b)/T(a)T(d). First note that yh ip = d} =1 so that 


“\(N\ V(atb) P(atd)0(N+b—d) _ 
» (1) T(a)(b) I'(N+a+b) = 
The Bayes estimator is given by 
5 (ey = aa) QP an +d) 
Dae It) GE (a+a)0(W + b—d) 


A little simplification, writing d = (d—a) +a and using 
d\ (N—d\(N\ _ (N-n\(N\(n 
k})\n—-k}]\d)) \d-k)\n)\ky’ 


50) = Le OFT ++ WNW +b— 4) ; 
es Ug ee nae 
a+b+N_ a(N—n) 


atb+n a+b+n 


yields 


Now putting 


b N- 
pee ag ge 
a+b-+n a+b+n 
and solving for a and b, we get 
gel ao pO 
a-—l a—l 


Since a > 0, 6 >0, and sinceb > 0,N > an+. Moreover, a > 1lifN >n+1.IfN=n+1, 
the result is obtained if we give D a binomial distribution with parameter p = 5. IfN=n, 
the result is immediate. 


The following theorem which is an extension of Theorem 2 is of considerable help to 
prove minimaxity of various estimators. 
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Theorem 3. Let {7,(0); k > 1} be a sequence of prior distributions on © and let 
{df} be the corresponding sequence of Bayes estimators with Bayes risks R(7,;5;). If 
lim sup;_,5o R(7436¢) =1r* and there exists an estimator 6* for which 


sup R(0,0*) <r* 
dco 


then 6* is minimax. 
Proof. Suppose 6* is not minimax. Then there exists an estimator 6 such that 


sup R(0,4) < supR(0,6*). 
6cO 6cO 


On the other hand, consider the Bayes estimators { 6; } corresponding to the priors {77;,(0) }. 
We obtain 


R(midt) = f R(O,5;)u(0)a6 2) 
= / R(0,)4(0)d0 (13) 
< supR(6,6), (14) 

ae) 


which contradicts supgce R(0,6") < r*. Hence 6* is minimax. 


Example 17. Let X,,...,Xn be a sample of size n from N(1, 1). Then, the MLE of pz is X 
with variance i We show that X is minimax. Let ps ~ N(0,77). Then the Bayes estimator 


of pris X( ao) . The Bayes risk of this estimator is R(7,5,2) = +( ee ). Now, as T? + 00 


R(x, 6*,) > + which is the risk of X. Hence X is minimax. 


Definition 10. A decision rule 6 is inadmissible if there exists a 6* € D such that 
R(6,0*) < R(0@,6) where the inequality is strict for some 0 € ©; otherwise 6 is admissible. 


Theorem 4. If X,,...,X,, is a sample from N(6, 1), then X is an admissible estimator of 6 
under square error loss L(0,a) = (@ —a)*. 


1 
6*(x) such that R(™,0*) < R(0,X) while the inequality is strict for some 0 = 6p (say). 
Now, the risk R(@,0) is a continuous function of 6 and hence there exists an ¢ > 0 such 
that R(0,6*) < R(0,X) —e for | —o| <e. 

Now consider the prior N(0,77). Then the Bayes estimator is 6(X) = X (1+ 2 yo 


. . 1 nr 
with risk 7 (tmz). Thus, 


Proof. Clearly, X ~ N(0,+). Suppose X is not admissible, then there exists another rule 
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However, 
T(R(7,6*) — R(7,X)] 
=r | (R(@.57)-R,X) = exp sath ab 
2 Oo+e Logs 
So exp{—58 a 
We get 


0 < 7[R(7,6*) —R(x,X)] +7 [R(x,X) — R(x, 6,2)] 


Bhs 1 rol 
eas 4 0" bd — -—__— —_, 
< exp 772 \ + 


E 
V Qn [. 


26? 


The right-hand side goes to — 


is admissible. Hence X is admissible under squared loss. 


417 


i= as T —> oo. This result leads to a contradiction that 6* 


Thus we have proved that X is an admissible minimax estimator of the mean of a normal 


distribution N(0, 1). 


PROBLEMS 8.8 


1. It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has essentially 
three choices: (1) to take an umbrella and face the possible prospect of carrying it 
around in the sunshine; (2) to leave the umbrella at home and perhaps get drenched; 
or (3) to just give up the lecture and stay at home. Let 0 = {6,02}, where 0; corre- 
sponds to rain, and 62, to no rain. Let A = {a),a2,a3}, where a; corresponds to the 
choice i, i= 1,2,3. Suppose that the following table gives the losses for the decision 


problem: 
a) 
a\ 1 2 
a2 4 0) 
a3 5 5 


The teacher has to make a decision on the basis of a weather report that depends on 


6 as follows. 


O, | A 
W, (Rain) 0.7 | 0.2 
W2 (No rain) | 0.3 | 0.8 


Find the minimax rule to help the teacher reach a decision. 
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10. 


8.9 
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. Let X,,X,...,X, be a random sample from P(A). For estimating , using the 


quadratic error loss function, an a priori distribution over 0, given by PDF 
mA) =e? if \ > 0, 
=0 otherwise, 
is used: 


(a) Find the Bayes estimator for . 


(b) If it is required to estimate y() = e~* with the same loss function and same a 
priori PDF, find the Bayes estimator for y(A). 


. Let X,,X2,...,X, be a sample from b(1,@). Consider the class of decision rules 6 of 


the form 6(x1,%2,--.,X%n) =n! Yi x; +a, where a is a constant to be determined. 


Find a according to the minimax principle, using the loss function (9 — 6)”, where 
6 is an estimator for 6. 


. Let 6* be a minimax estimator for ay(@) with respect to the squared error loss 


function. Show that ad* + b(a,b constants) is a minimax estimator for ayw(0) +b. 


. Let X ~ b(n,@), and suppose that the a priori PDF of 0 is U(0,1). Find the Bayes 


estimator of 0, using loss function L(0,5) = (0 — 6)?/[0(1 — 6)]. Find a minimax 
estimator for 0. 


. In Example 5 find the Bayes estimator for p’. 
. Let X,X2,...,X, be arandom sample from G(1, 1/2). To estimate 4, let the a priori 


PDF on \ be (A) =e~*, \ > 0, and let the loss function be squared error. Find the 
Bayes estimator of . 


. Let X1,X2,...,X, be iid U(0,9) RVs. Suppose the prior distribution of @ is a Pareto 


PDF x(0) = pan for 0 > a, = 0 for 6 < a. Using the quadratic loss function find 
the Bayes estimator of 6. 


. Let T be the unique Bayes estimator of @ with respect to the prior density 7. Then 


T is admissible. 
Let X1,X2,...,Xn be iid with PDF fo(x) = exp{—(x—0)}, x > 0. Take 7(0) =e7°, 
0 > 0. Find the Bayes estimator of 9 under quadratic loss. 


. For the PDF of Problem 10 consider the estimation of @ under quadratic loss. Con- 


sider the class of estimators a(X«) _ 1) for all a > 0. Show that X(;) — 1/n is 
minimax in this class. 


PRINCIPLE OF EQUIVARIANCE 


Let P = {Po : 0 € O} bea family of distributions of some RV X. Let X C R,, be sample 
space of values of X. In Section 8.8 we saw that the statistical decision theory revolves 
around the following four basic elements: the parameter space O, the action space A, the 
sample space X, and the loss function L(0,a). 

Let G be a group of transformations which map X onto itself. We say that P is invariant 
under S if for each g € G and every @ € O, there is a unique 6’ = g0 € O such that 
g(X) ~ Pge whenever X ~ Pg. Accordingly, 


Po{g(X) € A} = Pr{X € A} (1) 
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for all Borel subsets in ®,,. We note that the invariance of P under G does not change the 
class of distributions we begin with; it only changes the parameter or index @ to g@. The 
group G induces G, a group of transformations g on O onto itself. 


Example I. Let X ~ b(n,p),0 <p < 1. Let 5 = {g,e}, where g(x) =n—x, and e(x) =x. 
Then gg! =e. Clearly, g(X) ~ b(n, 1—p) so that gp = 1—p and ép =e. The group § 
leaves {b(n,p); 0 < p < 1} invariant. 

Example 2. Let X\,X2,...,Xn be iid N(,07) RVs. Consider the affine group of trans- 


formations § = {{a,b}, aE R, b > 0} on X. The joint PDF of {a,b}X = (a+ bX,..., 
a+bX,) is given by 


1 
vey hal = — bis) 
f (%1,x2, 1X ) (bo 2m)" on 59 2b202 Yes a i) ‘| 


and we see that 


8(H,0) = (a+ po, bo) = {a,b} {p, 0}. 
Clearly S leaves the family of joint PDFs of X invariant. 


In order to apply invariance considerations to a decision problem we need also to ensure 
that the loss function is invariant. 


Definition 1. A decision problem is said to be invariant under a group G if 


(i) P is invariant under S and 


(ii) the loss function L is invariant in the sense that for every g € S anda € A there is 
a unique a’ € A such that 


L(6,a) = L(g0,a’) for all 6. 


The a’ € A in Definition 1 is uniquely determined by g and may be denoted by g(a). 
One can show that § = {g: g € G} is a group of transformations of A into itself. 


Example 3. Consider the estimation of jz in sampling from N(j,1). In Example 8.9.2 
we have shown that the normal family is invariant under the location group § = 
{{b,1},-00 < b < co}. Consider the quadratic loss function 


L(u,a) = (u—a)’. 
Then, {b, l}a =b+aand {b,1}{u,1} = {b+ p, 1}. Hence, 
L({b, 1}, {b, 1}a) = Lb +n) — (b+a)P = (wa)? = L(y1,0). 


Thus L(j,a) is invariant under S and the problem of estimation of j1 is invariant under 
group S. 
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Example 4. Consider the normal family (0,07) which is invariant under the scale group 
G = {{0,c},c > O}. Let the loss function be 


1 
L(o’,a) = q(o" -a)’. 
Now, {0,c}a = ca and {0,c}{0,07} = {0,co”} and 
Liideele? 10 mee 2 2_ !; a2 aa 
[{ ,cho a cha] = 3G (co —ca) = AG —a) a (0 ,a). 


Thus, the loss function L(o7,a) is invariant under S = {{0,c},c > 0} and the problem of 
estimation of o? is invariant. 


Example 5. Consider the loss function 
2 


a a 
L(o’,a) = go ee 


for the estimation of o? from the normal family N(0,07). We show that this loss-function 
is invariant under the scale group. Since 


{0,c}o? = {0,co7} and {0,c}{0,a} = {0,ca}, 


we have 
ca ca 
i 2 aaa ay Fe 
[{0,c}o?,{0,c}a] = 5-1 -log 4 
= L(o’,a). 


Let us now return to the problem of estimation of a parametric function ~ :O — &. 
For convenience let us take 0 C ® and w(@) = 0. Then A = O and G = S. 


Suppose 0 is the mean of PDF fo, S = {{b, 1}, b € R}, and {fo} is invariant under 
G. Consider the estimator 0(X) = X. What we want in an estimator 0* of 6 is that it 
changes in the same prescribed way as the data are changed. In our case, since X changes 
to {b,1}X = X +) we would like X to transform to {b, 1}X =X +b. 


Definition 2. An estimator 5(X) of @ is said to be equivariant, under S, if 
d(gX) = gd(X) for all g € S, (2) 
where we have written gX for g(X) for convenience. 
Indeed g on S induces g on O. Thus if X ~ fo, then gX ~ feo so if 5(X) estimates 
@ then 6(gX) should estimate 30. The principle of equivariance requires that we restrict 


attention to equivariant estimators and select the “best” estimator in this class in a sense 
to be described later in this section. 
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Example 6. In Example 3, consider the estimators 0) (X) = X, O)(X) = (Xi) + X(ny)/2, 
and 03(X) = aX, wa fixed real number. Then J = {(b,1), —oo < b < oo} induces § = § 
on © and both O,, 0, are equivariant under J. The estimator 43 is not equivariant unless 
a = 1. In Example 8.9.1 0(X) = X/n is an equivariant estimator of p. 


In Example 6 consider the statistic 0(X) = S?. Note that under the translation group 
{b, 1}X = X+b and O({b, 1}X) = O(X). That is, for every g € GS, O(gX) = O(X).A 
statistic O is said to be invariant under a group of transformations § if O(gX) = O(X) for 
all g € S. When G is the translation group, an invariant statistic (function) under G is called 
location-invariant. Similarly if G is the scale group, we call 0 scale-invariant and if G is 
the location-scale group, we call 0 location-scale invariant. In Example 6 04(X) = S? is 
location-invariant but not equivariant, and 0)(X) and 03(X) are not location-invariant. 

A very important property of equivariant estimators is that their risk function is constant 
on orbits of 6. 


Theorem 1. Suppose 0 is an equivariant estimator of @ in a problem which is invariant 
under §. Then the risk function of 0 satisfies 


R(g0,0) = R(9,0) (3) 


for all 6 € O and g € G. If, in particular, G is transitive over ©, then R (@, 0) is independent 
of @. 


Proof. We have for 6 € O andg € G 


R(O,0(X)) = EoL(0,0(X)) 
= EoL(#0,g0(X)) (Invariance of L) 
= EoL(g0,0(g(X)) (Equivariance of 6) 
= EzoL(g0,0(X)) (Invariance of {Po }) 
= R(g0,0(X)). 


In the special case when G is transitive over © then for any 0,02 € ©, there exists a 2 € G 
such that 62 = g6,. It follows that 


R(92, 0) — R(g0) ’ 0) = R(A, ’ 0) 
so that R is independent of 0. 
Remark I. When the risk function of every equivariant estimator is constant, an estimator 
(in the class equivariant estimators) which is obtained by minimizing the constant is called 
the minimum risk equivariant (MRE) estimator. 


Example 7. Let X\,X2,...,X, tid RVs with common PDF 


f(x,0) = exp{—(x—0)}, x >0, and =0, ifx <0. 
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Consider the location group § = {{b,1},—00 < b < 00} which induces § on © where 
S = G. Clearly G is transitive. Let L(0,0) = (9 — 0). Then the problem of estimation of 
@ is invariant and according to Theorem | the risk of every equivariant estimator is free of 
0. The estimator 6o(X) = X(1) — 4 is equivariant under G since 


1 1 
do({b, 1X) = min (Xi +b) — — =b+Xqy — o= b+ 60(X). 


min —— 
<i<n n 
We leave the reader to check that 
1 2 
R(6,0o) — Eo (x. —s a -6) = “> 
and it will be seen later that Op is the MRE estimator of 0. 


Example 8. In this example we consider sampling from a normal PDF. Let us first con- 
sider estimation of js when o = 1. Let § = {{b,1}, —a < b < co}. Then 0(X) =X is 
equivariant under G and it has the smallest risk 1/n. Note that {x, 1}~' = {—x, 1} may be 
used to designate x on its orbits 


{%,1}—1x = (x1 —%,...,% —%) = A(x). 


Clearly A(x) is invariant under S and A(X) is ancillary to 4. By Basu’s theorem A(X) 
and X are independent. 

Next consider estimation of 0? with p: = 0 and G = {{0,c},c > O}. Then S2 = $7) X? 
is an equivariant estimator of a7. Note that {0,s,}~! may be used to designate x on its 
orbits 


| x] Xn 
{0,5,} x= (2.....8) = A(x). 
Again A(x) is invariant under § and A(X) is ancillary to 0”. Moreover, S? and A(X) are 
independent. 
Finally, we consider estimation of (1,07) when § = {{b,c}, —a <b < 00, c > O}. 
Then (X,S2), where S? = )7}(X; — X)? is an equivariant estimator of (j.,07). Also 
{x,s,}~! may be used to designate x on its orbits 


ee ee (2... 2) Sa, 


Sx Sx 


Note that the statistic A(X) defined in each of the three cases considered in Example 8 
is constant on its orbits. A statistic A is said to be maximal invariant if 
(i) A is invariant, and 
(ii) A is maximal, that is, A(x,) = A(x2) => x; = g(xX2) for some g € S. 

We now derive an explicit expression for MRE estimator for a location parameter. Let 
X,,X2,...,X, be iid with common PDF fg (x) =f(x— 6), —00 < 6 < oo. Then {fy :9€ O} 
is invariant under § = {{b, 1}, —oo < b < co} and an estimator of @ is equivariant if 


A({b, 1}X = A(X) +b 


for all real b. 
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Lemma 1. An estimator 0 is equivariant for 0 if and only if 

O(X) =X, + q(X2—X,...,X,—X), (4) 
for some function q. 
Proof. Tf (4) holds then 


O({b, 1}x) = b+x1 +9(%2 —%1,--- Xn — 41) 
=b+0(x). 


Conversely, 


O(x) = O(x) +241 — 1,41 +92 —X1,.-- 4X1 +X — 11) 


=x, + O(0,%2 —X1,%++* ,Xn — 4X1), 
which is (4) with g(x2 —%1,..- Xn —X1) = O(0,x2 —¥1,...,Xn — 41). 
From Theorem 1 the risk function of an equivariant estimator O is constant with risk 
R(0,0) = R(0,0) = Ep[O(X)|?, for all 0, 


where the expectation is with respect to PDF fo(x) = f(x). Consequently, among all 
equivariant estimators O for 0, the MRE estimator is Op satisfying 


R(0,d) = minR(0, 2). 


Thus we only need to choose the function g in (4). 
Let L(@,0) be the loss function. Invariance considerations require that 


L(6,0) = L(g6,g0) =L(@+b,0 +b) 
for all real b so that L(0,0) must be some function w of 0-0. 
Let Y; = X;—X\,i=2,...,n, Y = (¥,...,Y,), and g(y) be the joint PDF of Y under 
6 = 0. Let h(x; |y) be the conditional density, under 6 = 0, of X; given Y = y. Then 
R(0,) = Eolw(% — 4(¥))] 
= [ { [wen atynenivyas ecryay. 6) 


Then R(0,0) will be minimized by choosing, for each fixed y, g(y) to be that value 
of c which minimizes 


[ve obhtuly)a (6) 
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Necessarily g depends on y. In the special case w(d— 6) = (d— 6)”, the integral in (6) is 


minimum when c is chosen to be the mean of the conditional distribution. Thus the unique 
MRE estimator of is given by 


Oo(x) = x1 — Ee {MX /Y = y}. (7) 
This is the so-called Pitman estimator. Let us simplify it a littke more by computing 
Eo{x1 —X |Y => y}. 


First we need to compute h(u|y). When @ = 0, the joint PDF of X1, Y2,..., Y, is easily 
seen to be 


Sf (x1 )f (1 +y2).- £1 +yn) 
so the joint PDF of (Y3,...,Y,) is given by 
[ flattutyn).fubyn de 


It follows that 


f (u)f (ut y2)++-f(u+yn) 
ime iC) (ut+yo)---f(u+y,)du- 


h(uly) = (8) 


Now let Z = x; — X;. Then the conditional PDF of Z given y is h(x; —z | y). It follows 
from (8) that 


0o(x) = Eo {Z| y} = ia zh(x; —z)dz 


is z] Tj Fi de 
- § lle if (xj — z)dz 


(9) 


Remark 2. Since the joint PDF of X1,X,...,Xn is [Tj fo(j) = [Tf — 9), the joint 
PDF 2 and X when 6 has prior 7(9) is 7() []j_, f(x; — 4). The joint marginal of X is 
a ) Tj: (j — 9)d9. It follows that the conditional pdf of @ given X = x is given by 


(8) TTF (4-9) 
S(O) Tif qj — 40 


Taking 7(@) = 1, the improper uniform prior on ©, we see from (9) that 0o(x) is the Bayes 
estimator of 0 under squared error loss and prior (0) = 1. Since the risk of 0p is constant, 
it follows that Op is also minimax estimator of 0. 
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Remark 3. Suppose S is sufficient for 9. Then te fo(x;) = go(s)h(x) so that the Pitman 
estimator of 6 can be rewritten as 

SP OTe fo) 40 

7 CS [Tj=:fe (xj)d0 

_ —— Og0(s)h(x)dd 

 f ga(s)h(x)d0 

— JP5, 60(s)ae 

SP 80(s)d0 ’ 


Oo(x) 


which is a function of s alone. 


Examples 7 and 8 (continued). A direct computation using (9) shows that X(1) — 1 /nis the 
Pitman MRE estimator of @ in Example 7 and X is the MRE estimator of js in Example 8 
(when o = 1). The results can be obtained by using sufficiency reduction. In Example 7, 
X 1) is the minimal sufficient statistic for 0. Every (translation) equivariant function based 
on X(;) must be of the form 0,(X) = X(1) +c where c is a real number. Then 


R(6,0-) = E6{X 1) +ce- 6}? 
= R(0,0)) +(c+1/n)? = (1/n)? + (c+1/n? 
which is minimized for c = —1/n. In Example 8, X is the minimal sufficient statistic so 


every equivariant function of X must be of the form 0.(X) = X +c, where c is a real 
constant. Then 


_ 1 
R(p,0c) = Eu(X+e—py = — +e", 
which is minimized for c = 0. 


Example 9. Let X,,X2,...,Xn be iid U(0 — 1/2,0+ 1/2). Then (X(1),X(n) is jointly 
sufficient for @. Clearly, 


1 Xa) < 0 <X(n) 
0 otherwise 


F188) =} 


so that Pitman estimator of @ is given by 


X(n) 
/ ddé eis Pines 
d(x) = “8 = SO 


X(n) 
/ 0 : 


a) 
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We now consider, briefly, Pitman estimator of a scale parameter. Let X have a joint 
PDF 


jo= Sr (F....), 


oO 


where f is known and o > Ois a scale parameter. The family {f, : 7 > 0} remains invariant 
under § = {{0,c},c > 0} which induces § = G on ©. Then for estimation of o* loss 
function L(c,a) is invariant under these transformations if and only if L(o,a) = w(4). 
An estimator 0 of o* is equivariant under G if 


A({0,c}X) =cA(XK) for allc >0. 


Some simple examples of scale-equivariant estimators of a are the mean deviation 
ya |X; — X|/n and the standard deviation y= (X; — X)?/(n— 1). We note that the group 


G over O is transitive so according to Theorem 1, the risk of any equivariant estimator of 
o* is free of o and an MRE estimator minimizes this risk over the class of all equivariant 
estimators of o*. Using the loss function L(a,a) = w(a/o*) = (a—o*)*/o** it can be 
shown that the MRE estimator of o*, also known as the Pitman estimate of o*, is given by 


{ alm hee st pees VXn)dv 
ie yn t2k—-lF(yx1,...,VX_)dv- 


a(x) = 


Just as in the location case one can show that 0p is a function of the minimal suffi- 
cient statistic and Op is the Bayes estimator of o* with improper prior t(a) = 1/o7*+!, 
Consequently, Oo is minimax. 


Example 8 (continued). In Example 8, the Pitman estimator of o* is easily shown to be 


Ha)” 


Thus the MRE estimator of o is given by - (24") ae (242) } and that of 0? by 
1X?/(n+2). 


Oo(X) = 


Example 10. Let X,,X2,...,X, be iid U(0,@). The Pitman estimator of 6 is given by 


i Xx, n+ 2 
Oo (X on Xin. 
o(X) = 2 vid nt” 
(n) 
PROBLEMS 8.9 
In all problems assume that X),X2,...,X,, is arandom sample from the distribution under 
consideration. 


1. Show that the following statistics are equivariant under translation group: 
(a) Median (X;). 
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(Cc) Xnpj41, the quantile of order p, 0 <p <1. 
@) (Ky +Xe4y +++ Xen-n) /(4= 20). 
(e) X+Y, where Y is the mean of a sample of size m,m # n. 
2. Show that the following statistics are invariant under location or scale or location- 
scale group: 
(a) X — median(X;). 
(b) X@4i-~ — Xi. 
(c) 0, |Xi-— X|/n. 
(d) wie Xi-X) (i -Y) 


{OL iH XP DLP} 
a bivariate distribution. 


vz, where (X1,Y1,...,(Xn, Yn) is arandom sample from 


3. Let the common distribution be G(a,o) where a (> 0) is known and o > 0 is 
unknown. Find the MRE estimator of o under loss L(a,a) = (1—a/o). 


4. Let the common PDF be the folded normal distribution 


\[Zex {-36- we Tico) (2). 


Verify that the best equivariant estimator of 4 under quadratic loss is given by 


7 exp{—$(Xq) —X)?} 
n(X 4) —X . 
Viner { fe" a”) He exp(-22/2)dz} 


> 
| 


5. Let X ~ U(0,26). 
(a) Show that (X(1),X(n)) is jointly sufficient statistic for 0. 


(b) Verify whether or not (Xn) - X(1)) is an unbiased estimator of 0. Find an 
ancillary statistic. 


(c) Determine the best invariant estimator of 6 under the loss function L(0,a) = 


(1-5) 


6. Let 
1 
f(x) = 5 exp{—lx—4)}. 
Find the Pitman estimator of @. 
7. Let fo(x) =exp{—(x—6)}-[l+exp{—(x—6)}]~?, forx € R, 0 € R. Find the Pitman 


estimator of 0. 
8. Show that an estimator O is (location) equivariant if and only if 


(x) = Ao(x) + O(x), 


where Op is any equivariant estimator and ¢ is an invariant function. 
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9, Let X,,X> be iid with PDF 


2 : 
fo(x) == (1 - a) ,0<x<o, and =0 otherwise. 
o o 


Find, explicitly, the Pitman estimator of 0”. 
10. Let X),X2,...,X, be iid with PDF 


1 
fo(x) = a exp(—x/0), x > 0, and =0, otherwise. 


Find the Pitman estimator of 6°. 


9 


NEYMAN-PEARSON THEORY OF 
TESTING OF HYPOTHESES 


9.1 INTRODUCTION 


Let X,,X>,...,X, be arandom sample from a population distribution Fg, 8 € O, where the 
functional form of Fg is known except, perhaps, for the parameter @. Thus, for example, the 
X;’s may be a random sample from N(6, 1), where @ € & is not known. In many practical 
problems the experimenter is interested in testing the validity of an assertion about the 
unknown parameter 0. For example, in a coin-tossing experiment it is of interest to test, 
in some sense, whether the (unknown) probability of heads p equals a given number po, 
0 < po < 1. Similarly, it is of interest to check the claim of a car manufacturer about 
the average mileage per gallon of gasoline achieved by a particular model. A problem of 
this type is usually referred to as a problem of testing of hypotheses and is the subject of 
discussion in this chapter. We will develop the fundamentals of Neyman—Pearson theory. 
In Section 9.2 we introduce the various concepts involved. In Section 9.3 the fundamental 
Neyman-—Pearson lemma is proved, and Sections 9.4 and 9.5 deal with some basic results 
in the testing of composite hypotheses. Section 9.6 deals with locally optimal tests. 


9.2 SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 


In Chapter 8 we discussed the problem of point estimation in sampling from a popula- 
tion whose distribution is known except for a finite number of unknown parameters. Here 
we consider another important problem in statistical inference, the testing of statistical 
hypotheses. We begin by considering the following examples. 
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430 NEYMAN-PEARSON THEORY OF TESTING OF HYPOTHESES 


Example 1. In coin-tossing experiments one frequently assumes that the coin is fair, 
that is, the probability of getting heads or tails is the same: 5. How does one test whether 
the coin is fair (unbiased) or loaded (biased)? If one is guided by intuition, a reasonable 
procedure would be to toss the coin n times and count the number of heads. If the pro- 
portion of heads observed does not deviate “too much” from p = 5, one would tend to 
conclude that the coin is fair. 


Example 2. It is usual for manufacturers to make quantitative assertions about their prod- 
ucts. For example, a manufacturer of 12-volt batteries may claim that a certain brand of his 
batteries lasts for N hours. How does one go about checking the truth of this assertion? A 
reasonable procedure suggests itself: Take a random sample of n batteries of the brand in 
question and note their length of life under more or less identical conditions. If the average 
length of life is “much smaller” than N, one would tend to doubt the manufacturer’s claim. 


To fix ideas, let us define formally the concepts involved. As usual, X = (X1,X2,...,Xn) 
and let X ~ Fg, 9€ OC Ry. It will be assumed that the functional form of Fg is known 
except for the parameter 8. Also, we assume that © contains at least two points. 


Definition 1. A parametric hypothesis is an assertion about the unknown parameter 6. 
It is usually referred to as the null hypothesis, Hj: 8 € Oo C O. The statement H;: 8 € 
0, = O — Op is usually referred to as the alternative hypothesis. 


Usually the null hypothesis is chosen to correspond to the smaller or simpler subset Oo 
of © and is a statement of “no difference,’ whereas the alternative represents change. 


Definition 2. If Q9(©;) contains only one point, we say that Q9(Q}) is simple; otherwise, 
composite. Thus, if a hypothesis is simple, the probability distribution of X is completely 
specified under that hypothesis. 


Example 3. Let X ~ N(,07). If both y and o? are unknown, 0 = {(1,07): — oo < 
LL < 00, a7 > O}. The hypothesis Ho: ps < 10, 07 > 0, where jig is a known constant, is 
a composite null hypothesis. The alternative hypothesis is Hy: ps > uo, 0? > 0, which is 
also composite. Similarly, the null hypothesis j= pio, 7 > 0 is also composite. 

Ifo? = a, is known, the hypothesis Ho: ju = [Uo is a simple hypothesis. 


Example 4. Let X,,X2,...,X, be iid b(1,p) RVs. Some hypotheses of interest are p = 5 
p< 5.p = 4 or, quite generally, p = po, p < po. p > po, where po is a known number, 


O0<po <i. 


The problem of testing of hypotheses may be described as follows: Given the sample 
point x = (x1,%2,...,X,), find a decision rule (function) that will lead to a decision to reject 
or fail to reject the null hypothesis. In other words, partition the sample space into two 
disjoint sets C and C* such that, if x € C, we reject Ho, and if x € C°, we fail to reject Ho. 
In the following we will write accept Hp when we fail to reject Hy. We emphasize that when 
the sample point x € C° and we fail to reject Ho, it does not mean that Ho gets our stamp 
of approval. It simply means that the sample does not have enough evidence against Ho. 
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Definition 3. Let X ~ Fg, 8 € O. A subset C of ®,, such that if x € C, then Ho is rejected 
(with probability 1) and is called the critical region (set): 


C={xER,: Ap is rejected if x € C}. 


There are two types of errors that can be made if one uses such a procedure. One may 
reject Hy when in fact it is true, called a type I error, or accept Ho when it is false, called a 
type II error, 


True 
Ho A, 
Ho Correct Type II Error 
Accept 
H, | Type I Error Correct 


If C is the critical region of a rule, PgC, 0 € Op, is a probability of type | error, and 
PoC’, 8 © O, is a probability of type Il error. Ideally, one would like to find a critical 
region for which both these probabilities are 0. This will be the case if we can find a subset 
SCR, such that PgS = 1 for every 8 € Op and PgS = 0 for every 8 € O;. Unfortunately, 
situations such as this do not arise in practice, although they are conceivable. For example, 
let X ~ C(1,@) under Ho and X ~ P(@) under H). Usually, if a critical region is such that the 
probability of type I error is 0, it will be of the form “do not reject Ho” and the probability 
of type II error will then be 1. 

The procedure used in practice is to limit the probability of type I error to some pre- 
assigned level a (usually 0.01 or 0.05) that is small and to minimize the probability of 
type II error. To restate our problem in terms of this requirement, let us formulate these 
notions. 


Definition 4. Every Borel-measurable mapping y of 8, — [0,1] is known as a test 
function. 


Some simple examples of test functions are y(x) = | for all x € R,, v(x) = 0 for all 
x ER, or p(x) =a,0<a< 1, forall x € &,,. In fact, Definition 4 includes Definition 3 
in the sense that, whenever y is the indicator function of some Borel subset A of &,,, A is 
called the critical region (of the test yp). 


Definition 5. The mapping ¢ is said to be a fest of hypothesis Hp: 8 € Op against the alter- 
natives H;: 8 € ©, with error probability a (also called level of significance or, simply, 
level) if 


Eop(X) <a for all 8 € Oo. (1) 


We shall say, in short, that y is a test for the problem (a, Qo, 91). 
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Let us write 6.,(@) = Egy(X). Our objective, in practice, will be to seek a test y for a 
given a, 0 < a < 1, such that 


sup By(@) <a. (2) 
acO) 


The left-hand side of (2) is usually known as the size of the test y. Condition (1) therefore 
restricts attention to tests whose size does not exceed a given level of significance a. 

The following interpretation may be given to all tests ~ satisfying 6,,(@) < a for all 
6 € Oo. To every x € KR, we assign a number ~(x), 0 < v(x) < 1, which is the probability 
of rejecting Ho that X ~ fg, 9 € Op, if x is observed. The restriction 6,,(0) < a for 8 € Oo 
then says that, if Ho were true, y rejects it with a probability < a. We will call such a test 
a randomized test function. If g(x) = I4(x), y will be called a nonrandomized test. If 
x € A, we reject Ho with probability 1; and if x ¢ A, this probability is 0. Needless to say, 
A€ By. 

We next turn our attention to the type IJ error. 


Definition 6. Let y be a test function for the problem (a,0po,0,). For every 0 € O 
define 


By(O) = Eop(X) = Po {Reject Ho}. (3) 


As a function of 8, 6,,(8) is called the power function of the test y. For any 0 € Q1, 8,(8) 
is called the power of y against the alternative 0. 


In view of Definitions 5 and 6 the problem of testing of hypotheses may now be refor- 
mulated. Let X ~ fg,9€ OCR, O = On + C1. Also, let 0 < a < 1 be given. Given a 
sample point x, find a test y(x) such that 6,(@) < a for @ € Op, and 6,,(@) is a maximum 
for 8 € ©). 


Definition 7. Let ®,, be the class of all tests for the problem (a, Oo, 01). A test yo € By 
is said to be a most powerful (MP) test against an alternative 0 € O, if 


Bu) (O) > By (8) for all pe By. (4) 


If ©, contains only one point, this definition suffices. If, on the other hand, ©, contains 
at least two points, as will usually be the case, we will have an MP test corresponding to 
each 0 € Qj. 


Definition 8. A test 9 € ®, for the problem (a, Oo, 01) is said to be a uniformly most 
powerful (UMP) test if 


Boy (0) > By(A) for ally € ®,, uniformly in 8 € ©;. (5) 
Thus, if Og and ©, are both composite, the problem is to find a UMP test y for the 


problem (a, 00,01). We will see that UMP tests very frequently do not exist, and we will 
have to place further restrictions on the class of all tests, ®,. 
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Note that if y~),y~2 are two tests and X is a real number, 0 < A < 1, then Ay; + 
(1 — A) v2 is also a test function, and it follows that the class of all test functions ®, is 
convex. 


Example 5. Let X;,X2,...,X, be iid N(, 1) RVs, where jz is unknown but it is known that 
LEO= {t0, Le}, Lo < 1. Let Ho: X; ~ N(t10, 1), A: Xi ~ N( p41, 1). Both Ho and Hy, 
are simple hypotheses. Intuitively, one would accept Ho if the sample mean X is “closer” 
to jo than to j11; that is to say, one would reject Ho if X > k, and accept Ho otherwise. The 


constant k is determined from the level requirements. Note that, under Hp, X ~ N(ju9, 1/n), 
and, under H), X ~ N(j11,1/n). Given 0 < a < 1, we have 


y X— po _ k= po 
Put{X >k}=P 
wo =P E> ae} 
= P{Type I error} = a, 


so that k = 49 +Za/,/n. The test, therefore, is (Fig. 1) 


1 ifX¥> po+Za/Vn, 
(x) = , 
0 otherwise. 
Here X is known as a fest statistic, and the test y is nonrandomized with critical region 
C={x: X> po+Za//n}. Note that in this case the continuity of X (that is, the absolute 
continuity of the DF of X) allows us to achieve any size a,0<a< 1. 
The power of the test at j1; is given by 


Accept Ho 


Ho Hot zal Vn i 


Fig. 1 Rejection region of Ho in Example 5. 
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where Z ~ N(0, 1). In particular, E,,, p(X) > a since j1; > jo. The probability of type II 
error is given by 


P{Type Il error} = 1— E,,, p(X) 
= PZ < tq —+/n (pi — un)}. 


Figure 2 gives a graph of the power function 6,,(j1) of y for 1 > 0 when up = 0, and 
Ay: p>. 


Example 6. Let X,X2,X3,X4,Xs5, be a sample from b(1,p), where p is unknown and 0 < 
p < 1. Consider the simple null hypothesis Ho: X; ~ b(1, 5), that is, under Ho, p = 3. 
Then H: X; ~ b(1,p), p # 1/2. A reasonable procedure would be to compute the average 
number of 1’s, namely, X = +} X;/5, and to accept Ho if |X — 5| <c, where c is to be 
determined. Let a = 0.10. Then we would like to choose c such that the size of our test 
is a, that is, 


= | 
0.10= Pain| K-31 >e}, 


or 


(6) 


-1.5 0 1.5 


Fig. 2. Power function of y in Example 5. 
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the following table. 


where k = 5c. Now yx ~ b(5, 5) under Ho, so that the PMF of 3 Xi- 3 is given in 


5 5 
Sox-5 Prin} X= Soa] 
rl 1 


=2.5 0.03125 
—1.5 0.15625 
—0.5 0.31250 
0.5 0.31250 
1.5 0.15625 
2.5 0.03125 


Note that we cannot choose any k to satisfy (6) exactly. It is clear that we have to reject 
Ho when k = £2.5, that is, when we observe 5+ X; = 0 or 5. The resulting size if we use 
this test is a = 0.03125 + 0.03125 = 0.0625 < 0.10. A second procedure would be to 


reject Ho if k = +£1.5 or 


2.5 (9) X; = 0,1,4,5), in which case the resulting size is a = 


0.0625 + 2(0.15625) = 0.375, which is considerably larger than 0.10. A third alternative, 
if we insist on achieving a = 0.10, is to randomize on the boundary. Instead of accepting or 
rejecting Ho with probability | when )> X; = 1 or 4, we reject Hp with probability 7 where 


5 


5 5 
0.10 = P,-1/2 {x =Oor sh +Pra1/2 {ox =lor s} 
1 


Thus 


1 


A randomized test of size a = 0.10 is therefore given by 


The power of this test is 


E, 


p(x) = 


5 

1 if 5 “x; =0or5, 
1 
5 


0.114 if Sox =1or4, 
1 


0 otherwise. 


5 5 
p(X) =P, {x =0 oes} +0.114P, {ox =] os} 
1 1 


where p 4 5 and can be computed for any value of p. Figure 3 gives a graph of 6,(p). 
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0.5 


0 0.5 1 1.5 


Fig. 3. Power function of y in Example 6. 


We conclude this section with the following remarks. 


Remark I, The problem of testing of hypotheses may be considered as a special case of the 
general decision problem described in Section 8.8. Let A = {ao,a,}, where ao represents 
the decision to accept Hy: 8 € Oo and a represents the decision to reject Hy. A decision 
function 6 is a mapping of &,, into A. Let us introduce the following loss functions: 


1 if @ 
L\(0,a,) = : a and L;(0,ao) = 0 for all 6, 
1 1 


and 


0 if8@€ Oo 


; and L»(0,a,) = 0 for all 6. 
1 if@8E€O0, 


L2(0, ao) = 


Then the minimization of EgL2(0,5(X)) subject to EgL)(0,6(X)) < a is the hypotheses 
testing problem discussed above. We have 


EgL2(0,0(X)) = Pe{d(X) = ao}, d€O,, 
= Pg{Accept Hp | H; true}, 
and 
Eel) (0,0(X)) = Pe{d(X) =a;}, 0€ Oo, 
= Po {Reject Hp | 8 € Op true}. 


Remark 2. In Example 6 we saw that the chosen size a is often unattainable. The choice 
of a specific value of a is completely arbitrary and is determined by nonstatistical 
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considerations such as the possible consequences of rejecting Ho falsely, and the economic 
and practical implications of the decision to reject Ho. An alternative, and somewhat sub- 
jective, approach wherever possible is to report the so-called P-value of the observed test 
statistic. This is the smallest level a at which the observed sample statistic is significant. In 
Example 6, let $= ee X;. If S = 0 is observed, then Px, (S = 0) = Po(S = 0) = 0.03125. 
By symmetry, if we reject Hp for S = 0 we should do so also for S = 5 so the probability 
of interest is Po(S = 0 or 5) = 0.0625 which is the P-value. If S = 1 is observed and we 
decide to reject Hp, then we would do so also for S = 0 because S = 0 is more extreme 
than S = 1. By symmetry considerations 


P-value = Po(S < 1 or S > 4) = 2(0.03125 + 0.15625) = 0.375. 


This discussion motivates Definition 9 below. Suppose the appropriate critical region 
for testing Hp against H is one-sided. That is, suppose C is either of the form {T > c;} 
or {T < co}, where T is the test statistic. 


Definition 9. The probability of observing under Hp a sample outcome at least as extreme 
as the one observed is called the P-value. The smaller the P-value, the more extreme the 
outcome and the stronger the evidence against Ho. 


If aw is given, then we reject Ho if P < a and do not reject Ho if P > a. In the two-sided 
case when the critical region is of the form C = {|T(X)| > k}, the one-sided P-value is 
doubled to obtain the P-value. If the distribution of T is not symmetric then the P-value 
is not well-defined in the two-sided case although many authors recommend doubling the 
one-sided P-value. 


PROBLEMS 9.2 


1. A sample of size | is taken from a population distribution P(A). To test Ho: A = 1 
against H;: \ = 2, consider the nonrandomized test y(x) = | if x > 3, and = 0 if 
x <3. Find the probabilities of type I and type II errors and the power of the test 
against A = 2. If it is required to achieve a size equal to 0.05, how should one modify 
the test y? 

2. Let X|,X2,...,X, be a sample from a population with finite mean y and finite vari- 
ance o”. Suppose that ju is not known, but o is known, and it is required to test ju = [10 
against /4 = [41 (441 > Uo). Let n be sufficiently large so that the central limit theorem 
holds, and consider the test 


P(X1,%2)--+;Xn) = 1 ifx>k, 
=0 ifz<k, 


where ¥ =n—!5~"_, x;. Find k such that the test has (approximately) size a. What is 
the power of this test at 2 = ju; ? If the probabilities of type I and type II errors are 
fixed at a and 3, respectively, find the smallest sample size needed. 


3. In Problem 2, if a is not known, find & such that the test vy has size a. 
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4. Let X,,X2,...,X, be a sample from N(j,1). For testing w < jo against uw > Lo 


consider the test function 


1 iff > wot, 
y(x1,%2, Xn) _ Zz 
O if¥< pot 2%. 


Jn 


Show that the power function of y is a nondecreasing function of j1. What is the size 
of the test? 


. A sample of size | is taken from an exponential PDF with parameter 0, that is, 


X ~ G(1,0). To test Ho: 0 = 1 against H,: 6 > 1, the test to be used is the 
nonrandomized test 


p(x) =1 if x > 2, 
=0 if x < 2. 


Find the size of the test. What is the power function? 


. Let X,,X2,...,X, be a sample from N(0,c7). To test Ho: o = oo against H, = 


ao # 09, it is suggested that the test 


1 if 02x? >) or Ox? < ep, 
0 fe > x < cj; 


(p(x1,%2,---,Xn) = 


be used. How will you find c; and cz such that the size of ~ is a preassigned number 
a, 0 < a < 1? What is the power function of this test? 


. Anurn contains 10 marbles, of which M are white and 10— M are black. To test that 


M =5 against the alternative hypothesis that M = 6, one draws 3 marbles from the 
urn without replacement. The null hypothesis is rejected if the sample contains 2 or 
3 white marbles; otherwise it is accepted. Find the size of the test and its power. 


9.3. NEYMAN-PEARSON LEMMA 


In this section we prove the fundamental lemma due to Neyman and Pearson [76], which 
gives a general method for finding a best (most powerful) test of a simple hypothesis 
against a simple alternative. Let {fp,6 € O}, where O = {60,0,}, be a family of possible 
distributions of X. Also, fg represents the PDF of X if X is a continuous type rv, and the 
PME of X if X is of the discrete type. Let us write fo(x) = fo, (x) and fi (x) = fo, (x) for 
convenience. 


Theorem 1 (The Neyman—Pearson Fundamental Lemma). 


(a) Any test y of the form 


1 
p(x) = 4 7(x) iffitx) =kfo(x), (1) 
0) 
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for some k > 0 and 0 < 7(x) < 1, is most powerful of its size for testing Hy: 0 = 0 
against H,: 6 = 0). If k = ow, the test 


(2) 


is most powerful of size 0 for testing Ho against H,. 


(b) Given a, 0 <a < 1, there exists a test of form (1) or (2) with (x) = y (a constant), 
for which Eg, p(X) = a. 


Proof. Let vy be a test satisfying (1), and ~* be any test with Eg,y*(X) < Ey, p(X). In 
the continuous case 


~ —k fo(x 


vi I fi I 


(x)) (fix) — k fo(x)) dx 


For any x € {f\ (x) > kfo(x)}, p(x) — y* (x) * (x) > 0, so that the integrand is > 0. 
For x € {fi(x) < kfo(x)}, p(x o y* (x) =- oe ne < 0, so that the integrand is again > 0. 
It follows that 


/ (v(x) — o"(x)) (fi) — ke fol) ax 
= Eo, o(X) — Eg, 0" (X) — k(Eayo(X) — Ea" (X)) > 0, 


which implies 


Eo, p(X) — Eo, 9" (X) = k(Eo. p(X) — Ep" (X)) = 0 


since Ey, p*(X) < Ea, p(X). 
If k = on, any test y* of size 0 must vanish on the set {fo(x) > 0}. We have 


FX) —Ene'(X)= f(g bo yilw)ax>0. 
{fo(x)=0} 
The proof for the discrete case requires the usual change of integral by a sum throughout. 
To prove (b) we need to restrict ourselves to the case where 0 < a < 1, since the MP 


size 0 test is given by (2). Let y(x) = 7, and let us compute the size of a test of form (1). 
We have 


Eo.p(X) = Po thi(X) > k fo(X)} + Pa th(X) =kfo(X)} 
= 1— Pa t{hi(X) <kfo(X)} +7 Poo th(X) =kfo(X)}- 
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Since Po, {fo(X) = 0} = 0, we may rewrite Ey, p(X) as 


Bee Pie ; es _ \ 
Eo p(X) = 1— Pa, {ae <kp+7Po, (XK) = kh. (3) 
Given 0 < a < 1, we wish to find k and ¥ such that E,.p(X) = a, that is, 
ful) i. pies Ne - 
Po, {ie Shp SPS, F(X) <kp=1-a. (4) 


Note that 


(xy <*} 


is a DF so that it is a nondecreasing and right continuous function of k. If there exists a ko 


such that 
fi(X) \ 
Po 4° <ko p =l1-a, 
‘ aes =" 
we choose y = 0 and k = kp. Otherwise there exists a kg such that 
fi(X) \ (8 (X) \ 
P <kyo> <1l-a<P <kop, (5) 
a fees ° fol) ~ 


that is, there is a jump at ko (see Fig. 1). In this case we choose k = kp and 


_ Pootfi(X)/fo(X) < ko} — (1a) 
2 Po {fi(X)/fo(X) = ko}- 


Since 7 given by (6) satisfies (4), and 0 < y < 1, the proof is complete. 


(6) 


Remark I. It is possible to show (see Problem 6) that the test given by (1) or (2) is unique 
(except on a null set), that is, if ~ is an MP test of size a of Hp against Hj, it must have 
form (1) or (2), except perhaps for a set A with Pg, (A) = Po, (A) = 0. 


Remark 2. An analysis of proof of part (a) of Theorem | shows that test (1) is MP even if 
fi and fo are not necessarily densities. 


Theorem 2. If a sufficient statistic T exists for the family {fp: 9 € O}, O = {6,01}, the 
Neyman-—Pearson MP test is a function of 7. 


Proof. The proof of this result is left as an exercise. 


Remark 3. If the family {fg : 0 € O} admits a sufficient statistic, one can restrict attention 
to tests based on the sufficient statistic, that is, to tests that are functions of the sufficient 
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14 


Fig. 1 
statistic. If y is a test function and T is a sufficient statistic, E{y(X) | T} is itself a test 
function, 0 < E{y(X) | T} < 1, and 
Eo{p(X) | T}} = Eap(X), 
so that y and E{y | T} have the same power function. 
Example 1. Let X be an RV with PMF under Hp and H; given by 


x | 1 2 6 4 5 6 
fo(x) | 0.01 0.01 0.01 0.01 0.01 0.95 
filx) | 0.05 0.04 0.03 0.02 0.01 0.85 


Then A(x) =fi(x)/fo(x) is given by 


123 45 6 
A(t) ] 55 4 3 2 «1 0.89 


If « = 0.03, for example, then Neyman—Pearson MP size 0.03 test rejects Ho if A(X) > 3, 
that is, if X < 3 and has power 


P(X <3) =0.05+0.04 +.0.03 = 0.12 


with P(Type II error) = 1 — 0.12 = 0.88. 
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Example 2. Let X ~ N(0,1) under Ho and X ~ €(1,0) under H;. To find an MP size a 
test of Ho against H,, 
f(x) _ (U/m) 1/0. +2")] 


A(x) ~ f(x) = (1/V2m)e—*/2 


2 er /2 
Vr l+x2" 

Figure 2 gives a graph of A(x) and we note that \ has a maximum at x = 0 and two min- 
imas at x = +1. Note that (0) = 0.7979 and (+1) = 0.6578 so for k € (0.6578, 0.7989), 
A(x) = k intersects the graph at four points and the critical region is of the form |X| < k, or 
|X| > ko, where k, and ky are solutions of A(x) = k. For k = 0.7979, the critical region is of 
the form |X| > ko, where ko is the positive solution of e~'0/? = 1 +k; so that ky © 1.59 with 


a@ = 0.1118. For k < 0.6578, a = | and for k = 0.6578, the critical region is |X| > 1 with 
a = 0.3413. For the traditional level a = 0.05, the critical region is of the form |X| > 1.96. 


Example 3. Let X;,X>,...,X, be iid b(1,p) RVs, and let Ho: p = po, Hi: p=Ppi,P1 > Po- 
The MP size a test of Hp against Hj, is of the form 


pei 


, yon _ > oy 
p(X1,%2,--- Xn) = Po (1 Po) > 
yy, A(x) =k, 
0, A(x) <k, 
A 


ee ee een eee ae A(0) = 0.7979 
}-- A(1)=0.6578 
> 
ko -l -k; 0 ky 1 ky x 
Fig. 2. Graph of A(x) = (2/m)1/2 2x2 /2) 


(+37) 
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where k and ¥ are determined from 


Eno P(X) =a. 


x; n— 0X; 
Pi =?) 

A(x) = {| — ae as , 
b) () (= 


and since p; > po, A(x) is an increasing function of 5+ x;. It follows that A(x) > k if and 
only if }> x; > k;, where k; is some constant. Thus the MP size a test is of the form 


Now 


1 if Sox >k, 
y(x)=47 if Dxa=k, 
0 otherwise. 


Also, k; and y are determined from 


= En p(X) = Pro {ox > a +P op {ox = af 
1 


i 


- n r n—r n ( n— 
= (") (1p) +7/ (0 —po) ae 
r ky 


r=k,+1 


Q 
| 


Note that the MP size a test is independent of p; as long as p; > po, that is, it remains an 
MP size a test against any p > po and is therefore a UMP test of p = po against p > po. 
In particular, let n = 5, po = 5, P= , and a = 0.05. Then the MP test is given by 


1 Sox >k, 
Ox)=4y La=k, 
0 Sox; <k, 


where k and ¥ are determined from 


oso SE) (0) 


It follows that k = 4 and y = 0.122. Thus the MP size a = 0.05 test is to reject p = 5 in 
favor of p = 3 if )>) X; =5 and reject p = 5 with probability 0.122 if >) X; =4. 

It is simply a matter of reversing inequalities to see that the MP size a test of Hp: p = po 
against H,: p = pi (p1 < po) is given by 


if ox <k, 


where 7 and k are determined from E,, p(X) = a. 
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We note that 7(X) = }>X; is minimal sufficient for p so that, in view of Remark 3, we 
could have considered tests based only on T. Since T ~ b(n,p), 


yy wg JBC (ny (1 


HO (pp —pay 


so that an MP Test is of the same form as above but the computation is somewhat simpler. 

We remark that in both cases (p; > po,p1 < po) the MP test is quite intuitive. We would 
tend to accept the larger probability if a larger number of “successes” showed up, and 
the smaller probability if a smaller number of “‘successes” were observed. See, however, 
Example 2. 


Example 4. Let X,,X2,...,Xn be iid N(j1,07) RVs where both yz and o* are unknown. 
We wish to test the null hypothesis Ho: ju = 0, 07 = 09 against the alternative H;: z= ju1, 
o? = 04. The fundamental lemma leads to the following MP test: 


_ ft if A(x) >k, 
ot = {4 if \(x) <k, 


where 


(1/o0V 27)" exp{—[¥ (xi = mn) /205)} 
(1/ooV 2m)" exp{—[D2(xi — Ho)?/209]} 


and k is determined from E,,,,4,9(X) = a. We have 


2 2 

feat Lo Lo Ly 
r =e i } : 
0) en{ Ds (4 “*) "(3 i) } 


If 41 > Lo, then 


A(x) = 


Ax) >k — ifandonly if S>x;>K, 


i=1 


where k’ is determined from 


n 
SiX;—npo _ k’ —npo \ 
a=P S X;>k’ =P{ > , 
bin i Vnoo Vn oo 


giving k! = z,./noo + nip. The case jz) < fio is treated similarly. If op is known, the 
test determined above is independent of j1; as long as 4; > lo, and it follows that the 
test is UMP against H}: > j41, c= aie If, however, oo is not known, that is, the null 
hypothesis is a composite hypothesis Hj): 11 = Lo, 0? > 0 to be tested against the alterna- 
tives H’/: = 11, 07 > 0 (441 > fo), then the MP test determined above depends on o°. 
In other words, an MP test against the alternative ju, 02 will not be MP against ju, 07, 
where 07 # op. 
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PROBLEMS 9.3 


1. 


10. 


11. 


12. 


A sample of size | is taken from PDF 


2 
gO —>) if0<x <8, 


0 otherwise. 


fo(x) = 


Find an MP test of Ho: 6 = 0 against H,: 6; (8; < 4). 


. Find the Neyman—Pearson size a test of Ho: 6 = 4 against H,: 0 = 6 (0; < %) 


based on a sample of size | from the PDF 


fo(x) = 20x+2(1-6)(1—x), O<x< I, 6 € [0,1]. 


. Find the Neyman—Pearson size a test of Hy: 6 = | against H,;: 3 = (; (> 1) based 


on a sample of size 1 from 


fea) = {oP spas 


0, otherwise. 


. Find an MP size a test of Ho: X ~ fo(x), where fo(x) = (21)-'W/2e-¥/2, —oo<x< 


oo, against H, : X ~ f; (x) where f, (x) = 2~!e—}!, —oo < x < 00, based on a sample 
of size 1. 


. For the PDF f(x) = e~“-®) x > 9, find an MP size a test of 6 = 0 against 6 = 0 


(> 4), based on a sample of size n. 


. If y* is an MP size a test of Ho: X ~ fo(x) against H,: X ~ f\ (x) show that it has 


to be either of form (1) or form (2) (except for a set of x that has probability 0 under 
Ho and H;). 


. Let y* be an MP size a (0 < a < 1) test of Hp against H,, and let k(q) denote the 


value of k in (1). Show that if a1 < a2, then k(a2) < k(a1). 


. For the family of Neyman—Pearson tests show that the larger the a, the smaller the 


G8 (=P[Type II error]). 


. Let 1 — 6 be the power of an MP size a test, where 0 < a < 1. Show thata <1—6 


unless Pg, = Po,- 
Let a be a real number, 0 < a < 1, and y* be an MP size a test of Ho against H). 
Also, let 8 = Ey, p*(X) < 1. Show that 1 — y* is an MP test for testing H, against 
H at level 1 — 6. 


Let X,,X2,...,X, be a random sample from PDF 


6 
folx)=>5 if 0<0<x<00. 
Xx 


Find an MP test of 6 = 00 against 6 = 0;(4 Oo). 

Let X be an observation in (0,1). Find an MP size a test of Hp: X ~ f(x) = 4x if 
O<x< 5, and = 4— 4x if 4 <x <1, against H): X ~ f(x) =1if0<.x <1. Find 
the power of your test. 
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13. In each of the following cases of simple versus simple hypotheses Hp : X ~ fo, Hy : 
X ~f;, draw a graph of the ratio \(x) = fi (x) /fo(x) and find the form of the Neyman— 
Pearson test: 

(a) fo(x) = (1/2) exp{—|x+ 1]}; fi) = (1/2) exp{—|x— 1]}. 
(b) fo(x) = (1/2) exp{—lx]}; fi) = {1/[7(1 +2°)]}. 
(©) fo(x) = (1/m) {1+ (1+2)?}1; fie) = (I/m) {1+ (1-27. 


14. Let X,,X2,...,X,, be a random sample with common PDF 
1 
fol) = 55 exp{—|1|/0}, xe R, 0 >0. 


Find a size a MP test for testing Ho : 6 = 00 versus H, : 6 = 0; (> 6). 
15. Let X ~ fj, j = 0,1, where 


x 1 2 3 4 «5 
f(x) 1/5 1/5 1/5 1/5 1/5 
filx) 1/6 1/4 1/6 1/4 1/6 


(a) Find the form of the MP test of its size. 
(b) Find the size and the power of your test for various values of the cutoff point. 


(c) Consider now a random sample of size n from fp under Ho or f; under H,. Find 
the form of the MP test of its size. 


9.4 FAMILIES WITH MONOTONE LIKELIHOOD RATIO 


In this section we consider the problem of testing one-sided hypotheses on a single real- 
valued parameter. Let {fy,6 € O} be a family of PDFs (PMFs), O C ®, and suppose that 
we wish to test Ho: 0 < 0 against the alternatives H,: 0 > Oo or its dual, Hj: 0 > 60, 
against H{: 0 < Oo. In general, it is not possible to find a UMP test for this problem. The 
MP test of Ho: 8 < , say, against the alternative 6 = 6; (> 69) depends on 6; and cannot 
be UMP. Here we consider a special class of distributions that is large enough to include the 
one-parameter exponential family, for which a UMP test of a one-sided hypothesis exists. 


Definition 1. Let {f9,6 € ©} be a family of PDFs (PMFs), 0 C ®. We say that {fg} has 
a monotone likelihood ratio (MLR) in statistic T(x) if for 6; < 02, whenever fo,, fo, are 
distinct, the ratio fo, (x) /fo, (x) is a nondecreasing function of T(x) for the set of values x 
for which at least one of fg, and fg, is > 0. 


It is also possible to define families of densities with nonincreasing MLR in T(x), but 
such families can be treated by symmetry. 


Example 1. Let X\,X2,...,X, ~ U[0,6], 0 > 0. The joint PDF of X),...,X,, is 


1 
—, O0<maxx; <9, 
fo(x) =< a" _ 


0, otherwise. 
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Let 02 > 0, and consider the ratio 


So (x) (1/03 imax x;<6] 


fo, (x) OF oe) mrerce nl 


0, n 
={(—)/1 : I - . 
(F) fae ely [max x;<6)] 


Let 


Tmax; 2 
R(x) — [ 12] 


> Tmax x;<61] 
_ Ji, maxx; € [0,61], 
7 co, maxx; € [01,0]. 


Define R(x) = oo if maxx; > 02. It follows that fp, /fo, is a nondecreasing function of 
max |<j<nX;, and the family of uniform densities on (0, 0] has an MLR in Max] <j<nXi- 
Theorem 1. The one-parameter exponential family 

fo(x) = exp{Q(9)T(x) + S(x) + D(O)}, (1) 
where Q(0) is nondecreasing, has an MLR in T(x). 
Proof. The proof is left as an exercise. 


Remark 1. The nondecreasingness of Q(@) can be obtained by a reparametrization, putting 
0 = Q(6), if necessary. 


Theorem | includes normal, binomial, Poisson, gamma (one parameter fixed), beta 
(one parameter fixed), and so on. In Example | we have already seen that U[0,6], which 


is not an exponential family, has an MLR. 


Example 2. Let X ~ C(1,0). Then 


= >] as x — -Eoo, 


and we see that C(1,@) does not have an MLR. 


Theorem 2. Let X ~ fo, 0 € O, where {fg} has an MLR in T(x). For testing Ho: 0 < 4 
against H,: 6 > 4, 00 € O, any test of the form 


1 if T(x) >, 
o(x)=47 if T(x) =h, (2) 
1 ifT(x)<f, 
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has a nondecreasing power function and is UMP of its size Ey, p(X) = a (provided that 
the size is not 0). 

Moreover, for every 0 < a < | and every 0 € O, there exists a tg, —oo < fg < 00, and 
0 <y< 1 such that the test described in (2) is the UMP size a test of Hp against H;. 


Proof. Let 6;,82 € ©, 6; < @3. By the fundamental lemma any test of the form 


1, A(x) > k, 
p(x) = 4 7(x), Ax) =&, (3) 
0, A(x) <k, 


where A(x) = fo, (x)/fo, (x) is MP of its size for testing 0 = 6, against 0 = 62, provided 
that 0 <k < oo and if k = o, the test 


0, 
0 (4) 


is MP of size 0. Since fg has an MLR in 7, it follows that any test of form (2) is also of 
form (3), provided that Eg, (X) > 0, that is, provided that its size is > 0. The trivial test 
y' (x) = a has size a and power a, so that the power of any test (2) is at least a, that is, 


Eo, p(X) > Eo, y'(X) = a = Eo, p(X). 


It follows that, if 0; < 02 and Eg, p(X) > 0, then Eg, p(X) < Eo, y(X), as asserted. 

Let 6; = 6 and 62 > 4, as above. We know that (2) is an MP test of its size Eg, p(X) 
for testing 6 = 0 against 0 = 62 (02 > @), provided that Eg,.p(X) > 0. Since the power 
function of y is nondecreasing, 


Eoy(X) < Eo, p(X) = Ao for all 0 < Mo. (5) 


Since, however, yy does not depend on 4, (it depends only on constants k and 7), it follows 
that vy is the UMP size ap test for testing 6 = 09 against 0 > 4. Thus y is UMP among 
the class of tests y” for which 


Eee" (X) < Eaye(X) = ao. (6) 


Now the class of tests satisfying (5) is contained in the class of tests satisfying (6) 
[there are more restrictions in (5)]. It follows that y», which is UMP in the larger class 
satisfying (6), must also be UMP in the smaller class satisfying (5). Thus, provided that 
ao > 0, y is the UMP size ag test for 0 < 6 against 0 > Oo. 

We ask the reader to complete the proof of the final part of the theorem, using the 
fundamental lemma. 
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Remark 2. By interchanging inequalities throughout in Theorem 2, we see that this 


theorem also provides a solution of the dual problem Hj: 0 > 0 against H}: 0 < 0. 


Example 3. Let X have the hypergeometric PMF 


eles 
Pitan ga 0,12 


Ch) 


Pysi{X =x} M+1N—M—n+x 
Py{X =x} N-M M+1-x ’ 


Since 


we see that {Py} has an MLR in x(Py,/Pu, where Mz > M, is just a product of such 
ratios). It follows that there exists a UMP test of Hj: M < Mo against H;: M > Mo, which 


rejects Hy when X is too large, that is, the UMP size a test is given by 


1, x>k, 
(x)= 4 x=k, 
0, x<k, 
where (integer) & and y are determined from 
Emu p(X) =a. 


For the one-parameter exponential family UMP tests exist also for some two-sided 


hypotheses of the form 
Ho: 0<6, or 06> 03(6 < 2). 


We state the following result without proof. 


(7) 


Theorem 3. For the one-parameter exponential family (1), there exists a UMP test of the 


hypothesis Hp: 6 < 6; or 6 > @5 (0; < 62) against H;: 0; < @ < @ that is of the form 


1 ife; <T(x) <a, 
p(x)=4y if T(x) =ci, b=1,2 (c1 <2), 
0 if T(x) <c)or >c), 


where the c’s and the y’s are given by 
Eo, p(X) = Eo, p(X) = a. 


See Lehmann [64, pp. 101-103], for proof. 


(8) 


(9) 
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Example 4. Let X,,X,...,X,, be iid N(, 1) RVs. To test Ho: fu < pup or pe > py (Ly > Lo) 
against Hy: Uo < p< j41, the UMP test is given by 


1 ife< pees <2; 
p(x)=<¢y% if Dx =c) orca, 
0 if Sox<c) or >c, 


where we determine c),C2 from 
=P, 1c = SOX <6} =P,,{¢1 < SOX; <co} 


and y; = y2 = 0. Thus 


EPMO > Xi — npo ge 
vn vn vn 


{i 7 Xi — ny w a 


vn vn vn 


cj —nbLo <Ze “in 


ci — Np C2 — pty 
= Ps —— <Z< — 
{ vn vn }, 


where Z is N(0, 1). Given a, n, ig, and j4,, we can solve for c, and cz from the simultaneous 
equations 


where © is the DF of Z. 


Remark 3. We caution the reader that UMP tests for testing Ho: 6; < 6 < 62 and 
Hj: 9 = 9 for the one-parameter exponential family do not exist. An example will suffice. 


Example 5. Let X,,X2,...,Xn be a sample from N(0, 07). Since the family of joint PDFs 
of X = (X),...,X,) has an MLR in T(X) = >>) X?, it follows that UMP tests exist for 
one-sided hypotheses o > oo and a < go. 

Consider now the null hypotheses Hp: o = oo against the alternative H,: ¢ 4 0. We 
will show that a UMP test of Ho does not exist. For testing 7 = 09 against 0 > oo, a test 
of the form 


x? Cy 
a= 4h ua > 


0, otherwise 
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is UMP, and for testing o = op against o < go, a test of the form 


x (65) 
ala = {1 oe <e2 


0 otherwise 


is UMP. If the size is chosen as a, then c) = 0 Xena and c2 = 3X4 1a Clearly, neither :| 
nor (2 is UMP for Ho against H;: o # oo. The power of any test of Ho for values 7 > 09 
cannot exceed that of y), and for values of o < a it cannot exceed the power of test Yo. 
Hence no test of Hp can be UMP (see Fig. 1). 


PROBLEMS 9.4 


1. For the following families of PMFs (PDFs) fg(x), 9 € O C &, find a UMP size a test 
of Ho: 0 < 6 against H;: 0 > 69, based on a sample of n observations. 


(a) fo(x) =@(1—6)!-*, x =0,1;0<0<1. 
(b) fo(x) = (1/27) exp{—(x—9)?/2}, —00 <x < 00, -00 <9 < 00. 
©) po)=e ("7 a), 2=0,1,2,.20> 0, 


LATO) —e*, 42> 0, 0 > 0, 
=6x9! 0<x<1,0>0. 


(e) fo(x 
(f) fo(x 


) 
) 
(d) fo(x) = (1/6)e*/°, x > 0,0 > 0. 
)=l 
) 


0 1 2 3 


Fig. 1 Power functions of chi-square tests of Ho : 0 = oo against H). 
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2. Let X,,X2,...,X;, be a sample of size n from the PMF 


Py(x) == x=1,2,...,N;N € {1,2,...}. 


(a) Show that the test 


1 if max(x1,%2,...,X%n) >No 


Kip Noy. agXh) = . 

P(A ¥2 ) {) if max(x1,%2,---,Xn) < No 
is UMP size a for testing Hj: N < No against H,: N > No. 

(b) Show that 


1 if max(x1,%2,-..,X%,) > No or 
(p(x1,%2,---,Xn) = max(x1,X2,...,%n) < a!/"No 


0 otherwise, 


is a UMP size a test of Hj): N = No against H}: N £ No. 
3. Let X|,X2,...,X, be a sample of size n from U(0,0), 6 > 0. Show that the test 


1 if max(x,...,%,) > 


d 


Pix, x2, noha Xn) = 


a if max(x1,x2,.--,%n) < A 
is UMP size a for testing Ho: 6 < 0 against H;: 6 > @ and that the test 


1 if max(x1,...,X,) > 9 or 
(p2(x1,X2,---,Xn) = max(x1,X2,.-.,%n) < O9a!/” 


0 otherwise 


is UMP size a for Hj: 0 = 0 against H,: 0 Oo. 
4. Does the Laplace family of PDFs 


1 
fo(x) = 5 exp{—lx—4]}, —00 <x< 00, OER, 


possess an MLR? 
5. Let X have logistic distribution with PDF 


fo(x) =e %fl+e me " 2. weER. 


Does {fo} belong to the exponential family? Does {fg} have MLR? 
6. (a) Let fp be the PDF of a N(0,0) RV. Does {fo} have MLR? 
(b) Do the same as in (a) if X ~ N(0,07). 
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9.5 UNBIASED AND INVARIANT TESTS 


We have seen that, if we restrict ourselves to the class ®, of all size a tests, there do not 
exist UMP tests for many important hypotheses. This suggests that we reduce the class of 
tests under consideration by imposing certain restrictions. 


Definition 1. A size a test y of Hy: 6 € Op against the alternatives H,: @ € ©, is said to 
be unbiased if 


Egy(X) >a for all 9 € ©. (1) 
It follows that a test yy is unbiased if and only if its power function 8,,(0) satisfies 
By(8) <a for 0 € Qo (2) 
and 
Bo(0) >a for 6 € ©. (3) 


This seems to be a reasonable requirement to place on a test. An unbiased test rejects a 
false Hp more often than a true Hp. 


Definition 2. Let U, be the class of all unbiased size a tests of Ho. If there exists a test 
y € U, that has maximum power at each @ € ©, we call y a UMP unbiased size a test. 


Clearly U,, C ®,. If a UMP test exists in ®,, it is UMP in U,,. This follows by com- 
paring the power of the UMP test with that of the trivial test p(x) = a. It is convenient to 
introduce another class of tests. 


Definition 3. A test y is said to be a-similar on a subset ©* of O if 
Bo(0) = Eop(X) =a for? € ©*. (4) 
A test is said to be similar on a set O* C O if it is a-similar on O* for some a,0O<a< 1. 


It is clear that there exists at least one similar test on every O*, namely, y(x) = a, 
O0<a<l. 


Theorem 1. Let 3,,(@) be continuous in @ for any y. If y is an unbiased size a test of 
Ho: 6 € Op against H,: @ € Qj, it is a-similar on the boundary A = 09 ©. (Here A is 
the closure of set A.) 


Proof. Let 0 © A. Then there exists a sequence {0,}, 6, € Qo, such that 6, —> 0. Since 
By(@) is continuous, 6, (6,) + B,(@); and since 8,(8,) <a, for 6, € Oo, B,(8@) < a. 
Similarly, there exists a sequence {67}, 0, € ©,, such that 3,,(0),) > a (vy is unbiased) and 
6) — 6. Thus 6,(6/) > 6,(68), and it follows that 6,(0) > a. Hence 6,,(0) =a for @€ A, 
and ¢ is a-similar on A. 
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Remark I. Thus, if 6.,(@) is continuous in @ for any y, an unbiased size a test of Ho 
against H; is also a-similar for the PDFs (PMFs) of A, that is, for {f9,6 € A}. If we can 
find an MP similar test of Ho: 0 € A against H;, and if this test is unbiased size a, then 
necessarily it is MP in the smaller class. 


Definition 4. A test ip that is UMP among all a-similar tests on the boundary A = Q9N ©, 
is said to be a UMP a-similar test. 


It is frequently easier to find a UMP a-similar test. Moreover, tests that are UMP similar 
on the boundary are often UMP unbiased. 


Theorem 2. Let the power function of every test y of Ho: 0 € Oo against H,: 0 € ©, be 
continuous in @. Then a UMP a-similar test is UMP unbiased, provided that its size is @ 
for testing Hp against Ay. 


Proof. Let yo be UMP a-similar. Then Egyo(X) < a for 6 € Oo. Comparing its power 
with that of the trivial similar test p(x) = a, we see that wo is unbiased also. By the 
continuity of 3,,(0) we see that the class of all unbiased size a tests is a subclass of the 
class of all a-similar tests. It follows that yo is a UMP unbiased size a test. 


Remark 2. The continuity of power function 6,(0) is not always easy to check but 
sufficient conditions may be found in most advanced calculus texts. See, for example, 
Widder [117, p. 356]. If the family of PDF (PMF) fg is an exponential family then a proof 
is given in Lehman [64, p. 59]. 


Example 1. Let X),X2,...,X, be a sample from N(,1). We wish to test Ho: u <0 
against H,: js > 0. Since the family of densities has an MLR in }7}X;, we can use 
Theorem 2 to conclude that a UMP test rejects Ho if pe X; > c. This test is also UMP 
unbiased. Nevertheless we use this example to illustrate the concepts introduced above. 

Here Oo = {11 < 0}, 0; = {uu > Of, and A = O9NO, = {u =O}. Since T(X) = >, X; 
is sufficient, we focus attention to tests based on T alone. Note that T ~ N(nj,n) which is 
one-parameter exponential. Thus the power function of any test y based on T is continuous 
in i. It follows that any unbiased size a test of Ho has the property {,,(0) = a of similarity 
over A. In order to use Theorem 2, we find a UMP test of Hj: w € A against H). Let p14, > 0. 
By the fundamental lemma an MP test of w = 0 against jz = f4; > 0 is given by 


1 if exp { papel } SK 
y(t) = : ° 


0 otherwise, 
_ Ji wWieek 
— )0 ift<k 
where k is determined from 


a= Po(T > k}=P{Z> =}. 


n 


UNBIASED AND INVARIANT TESTS 455 


Thus k = \/nz,. Since vy is independent of j1 as long as j4, > 0, we see that the test 


ay={' t> V/NnZa 


. 3 
0, otherwise, 


is UMP a-similar. We need only check that ¢ is of the right size for testing Hp against H). 
We have, for pu < 0, 


E, p(T) =P,{T > Vnzo} 


Jn 
<P{Z> Za}, 
since —\/np > 0. Here Z is N(0, 1). It follows that 
E,y(T) <a for uw <0, 


hence y is UMP unbiased. 


Theorem 2 can be used only if it is possible to find a UMP a-similar test. Unfortunately 
this requires heavy use of conditional expectation, and we will not pursue the subject any 
further. We refer to Lehmann [64, chapters 4 and 5] and Ferguson [28, pp. 224—233] for 
further details. 

Yet another reduction is obtained if we apply the principle of invariance to hypothesis 
testing problems. We recall that a class of distributions is invariant under a group of trans- 
formations G if for every g € S and every @ € O there exists a unique 0’ € © such that 
g(X) has distribution Pg, whenever X ~ Pg. We rewrite 6’ = 20. 

In a hypothesis testing problem we need to reformulate the principle of invariance. 
First, we need to ensure that under transformations § not only does P = {P9: 0 € O} 
remain invariant but also the problem of testing Ho: 8 € Oo against H,: @ € ©, remain 
invariant. Second, since the problem has not changed by application of S, the decision also 
must not change. 


Definition 5. A group G of transformations on the space of values of X leaves a hypothesis 
testing problem invariant if G leaves both {Pg: 0 € Oo} and {Pg: 8 € O;} invariant. 


Definition 6. We say that ¢ is invariant under G if 
y(g(x)) = v(x) for all x and all g € J. 


Definition 7. Let S be a group of transformations on the space of values of the RV X. We 
say that a statistic T(x) is maximal invariant under G if (a) T is invariant; (b) T is maximal, 
that is T(x,) = T(x2) > x1 = g(x2) for some g € J. 
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Example 2. Let x = (x,,%2,.-.,Xn,), and G be the group of translations 

g(x) = (x1 +c,.-.,%n +c), —00 <c <0. 
Here the space of values of X is ®,,. Consider the statistic 


T(x) = (%» —%1,---,Xn —Xn-1)- 


Clearly, 

T(ge(X)) = (%n —X1,---;Xn —Xn—-1) = T(X). 
If T(x) = 7(x’), then x, —x; =x, —x},i=1,2,...,n—1, and we have x; —x; =x, —x), =c 
(i= 1,2,...,n—1), that is, g-(x’) = (x, +c,...,x), +c) =x and T is maximal invariant. 


Next consider the group of scale changes 
&e(X) = (CX1,.--,CXn), c>0. 
Then 


0 if all x; = 0, 
1/2 
T(x) al X1 x 7 
ee if at least one x; 4 0, = : ; 
( Zz Zz ) fe : A 


is maximal invariant; for 
T(g-(x)) =T(cx,...,C%) = T(x), 
and if T(x) = T(x’), then either T(x) = T(x’) = 0 in which case x; = x; = 0, or T(x) = 
T(x’) £0, in which case x;/z = x;/z’, implying x; = (z’/z)x; = cx;, and T is maximal. 
Finally, if we consider the group of translation and scale changes, 


g(x) = (ay +8,....,2%e+8), a>0, —o <b<o, 


a maximal invariant is 


where ¥ = n—! $0) x; and 8 =n—! Si (x; —X)?. 


Definition 8. Let 7, denote the class of all invariant size a tests of Hy: 0 € Oo against 
H,: 6 € ©,. If there exists a UMP member in J,,, we call the test a UMP invariant test of Ho 
against H). 
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The search for UMP invariant tests is greatly facilitated by the use of the following 
result. 


Theorem 3. Let T(x) be maximal invariant with respect to §. Then ¢ is invariant under 
G if and only if y is a function of T. 


Proof. Let y be invariant. We have to show that T(x,) = T(x2) > y(x1) = ¢(x2). If 
T(x) =T(xz), there is a g € G such that x; = g(xz), so that y(x)) = y(g(x2)) = y(x2). 
Conversely, if vy is a function of T, p(x) = h[T(x)], then 


and ¢ is invariant. 


Remark 3. The use of Theorem 3 is obvious. If a hypothesis testing problem is invariant 
under a group G, the principle of invariance restricts attention to invariant tests. According 
to Theorem 3, it suffices to restrict attention to test functions that are functions of maximal 
invariant T. 


Example 3. Let X\,X2,...,X, be a sample from N(j:,07), where both ys and o? are 
unknown. We wish to test Hp: ¢ > 09, —cO < sf < &, against Hy: 0 < 09, —o0 < 
fu < oo. The family {N(j,07)} remains invariant under translations x; = x; +c, 
—oo <c < oo. Moreover, since var(X + c) = var(X), the hypothesis testing problem 
remains invariant under the group of translations, that is, both {N(y,07): 0? > oa} and 
{N(y,07): 0? <3} remain invariant. The joint sufficient statistic is (X, )~(X;—X)), 
which is transformed to (X +c, 5>(X;—X)) under translations. A maximal invariant is 
S>(X; — X)’. It follows that the class of invariant tests consists of tests that are functions 
of > (X;—X)?. 
Now >>(X;—X)*/07 ~ x?(n—1), so that the PDF of Z = 5>(X; — X)? is given by 


ol) 


fel) = TARA 


zlt-3)/29-2/20" z>0. 


The family of densities {f,2: 0? > 0} has an MLR in z, and it follows that a UMP test is 
to reject Ho: o> a if z<k, that is, a UMP invariant test is given by 


_ fl #37 <6, 
oe) ={) if \(4—z)? Sk, 


where k is determined from the size restriction 


a= Pn {Yi- 2? sk} =P{ DAM < HY 


a) % 


that is, 


ee ey) 
k= 90Xn-1,1—a° 
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Example 4. Let X have PDF f;(x, —0,...,x, — 0) under H; (i = 0,1), —co < 6 < ov. Let 
G be the group of translations 


g(x) = (4 +¢,...,% +0), —o<c<o, n>2?2. 


Clearly, g induces g on O, where g4 = 6+ c. The hypothesis testing problem remains 
invariant under §. A maximal invariant under G is T(X) = (Xj — Xy,...,Xn—1 —Xn) = 
(T),T>2,...,Tn—1). The class of invariant tests coincides with the class of tests that are 
functions of T. The PDF of T under H; is independent of 0 and is given by Ce filth + 

.,tn—1 +2,z) dz. The problem is thus reduced to testing a simple hypothesis against a 
simple alternative. By the fundamental lemma the MP test 


1 ifA(t)>c 


ee ee a) 
Pltsia y : if \(t) <c 


where t = (t1,f,..-,tn—-1) and 


eee fi (iSight Fe ees 


, 
“T fo Qrlienghek te) ae 


is UMP invariant. 
A particular case of Example 4 will be, for instance, to test Hy: X ~ N(0,1) against 
HH: X ~ C(1,0),6 € 8. See Problem 1. 


Example 5. Suppose (X,Y) has joint PDF 


fo(x,y) = Awexp{—Ax— py}, x >0, y > 0, 


and = 0 elsewhere, where 8 = (A,j4) , A > 0, yw > 0. Consider scale group 9 = 
{{0,c}, c > 0} which leaves {fg} invariant. Suppose we wish to test Hp : 4 > against 
AH, :  < X. It is easy to see that GOo = Oo so that G leaves (a, 90,01) invariant and 
T = Y/X is maximal invariant. The PDF of T is given by 


Au 


fg (t) = Os pie’ 


t>0, =O fort <0. 


The family {f7} has MLR in T and hence a UMP invariant test of Ho is of the form 


1, t>c(a), 
ep= \y t=c(a), 
0, t<c(a), 


where 
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PROBLEMS 9.5 


1. 


To test Ho: X ~ N(@, 1) against H, : X ~ C(1,@) a sample of size 2 is available on X. 
Find a UMP invariant test of Ho against Hy. 


. Let X;,X2,...,X, be a sample from P(A). Find a UMP unbiased size a test for 


the null hypothesis Hp: A < Xo against alternatives \ > Ao by the methods of this 
section. 


. Let X ~ NB(1;0). By the methods of this section find a UMP unbiased size a test 


of Ho: 8 > Oo against H,: 0 < 4. 


. Let X,X2,...,X, iid N(jz,07) RVs. Consider the problem of testing Ho : pp < 0 


against H; : uw > 0: 

(a) It suffices to restrict attention to sufficient statistic (U,V) where U = X and 
V =S°. Show that the problem of testing Ho is invariant under § = {{a,1}, 
a € R} and a maximal invariant is T= U/VV. 

(b) Show that the distribution of T has MLR and a UMP invariant test rejects Ho 
when T > c. 


. Let X1,X2,...,X, be iid RVs and let Ho be that X; ~ N(@,1), and H; be that the 


common PDF is f(x) = (1/2) exp{—|x— 6|}. Find the form of the UMP invariant 
test of Ho against H). 


. Let X),X2,...,X, be iid RVs and suppose Ho : X; ~ N(0,1) and A, : X; ~ fi(x) = 


exp{—|x|}/2: 
(a) Show that the problem of testing Hp against H, is invariant under scale changes 
&-(x) = cx, c > 0 and a maximal invariant is T(X) = (X1/Xn,.--,Xn—1/Xn)- 


n+l1 
+S 
i=1 


n—1 
1+0¥7/ 
i=1 


< k where Y; = X;/X,, j = 1,2,...,n—1, or equivalently when 


(b) Show that the MP invariant test rejects Hy when 


1/2 


<k. 


9.6 LOCALLY MOST POWERFUL TESTS 


In the previous section we argued that whenever a UMP test does not exist, we restrict the 
class of tests under consideration and then find a UMP test in the subclass. Yet another 
approach when no UMP test exists is to restrict the parameter set to a subset of ©). In 
most problems, the parameter values that are close to the null hypothesis are the hardest 
to detect. Tests that have good power properties for “local alternatives” may also retain 
good power properties for “nonlocal” alternatives. 
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Definition 1. Let © C ®. Then a test yp with power function {,,,(0) = Egyo(X) is said 
to be a locally most powerful (LMP) test of Ho : 8 < 09 against H : 6 > Oo if there exists 
a A > 0 such that for any other test y~ with 


Bolo) = Boy (Bo) = f eCeifn x)ax a) 
Bo) (0) > By(A) for every 6 € (4,40 +Al. (2) 


We assume that the tests under consideration have continuously differentiable power 
function at 9 = 6 and the derivative may be taken under the integral sign. In that case, an 
LMP test maximizes 


sj e0)), = 80)),_, = feo ggfla)],__, ax @) 


0=0) 6=0 6=0p 


subject to the size constraint (1). A slight extension of the Neyman—Pearson lemma 
(Remark 9.3.2) implies that a test satisfying (1) and given by 


1 if Sfo(x) ; > kfoy (x), 
yo(x)= 47 if Zfo(x) & = kfo, (x), (4) 
0 if safo(x)| | < Koo x) 


will maximize /3’,(90). It is possible that a test that maximizes /,(0) is not LMP, but if 
the test maximizes {’(0) and is unique then it must be LMP test (see Kallenberg et al. [49, 
p. 290] and Lehmann [64, p. 528]). 

Note that for x for which fo, (x) 4 0 we can write 


 _ O 
foo (x) a 06 logfa(x)|,, 


and then 
1 if Zlogfa(x) eae 
0 
vox) = 17 if gplogfolx)|, =k, (5) 
0 
0 if J logfe(x) 0 < 


Example 1. Let X,,X2,...,X, be iid with common normal PDF with mean ju and vari- 
ance a”. If one of these parameters is unknown while the other is known, the family of 
PDFs has MLR and UMP tests exist for one-sided hypotheses for the unknown parameter. 
Let us derive the LMP test in each case. 
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First consider the case when o? is known, say o? = | and Ho : pp <0, Hy : p> 0. An 
easy computation shows that an LMP test is of the form 


which, of course, is the form of the UMP test obtained in Problem 9.4.1 by an application 
of Theorem 9.4.2. 

Next consider the case when ju is known, say 4 = 0 and Ho: o < 09, Hi : 0 > oo. Using 
(5) we see that an LMP test is of the form 


1 if Wp 2>k 
pied) Vea 
0. if Sa Sek 


L 


which coincides with the UMP test. 
In each case the power function is differentiable and the derivatives may be taken inside 
the integral sign because the PDF is a one-parameter exponential type PDF. 


Example 2. Let X,,X2,...,X, be tid RVs with common PDF 


1 1 
folx) = m1+(x—6)?’ 


xER, 


and consider the problem of testing Ho : 9 < 0 against H, : @ > 0. 


In this case {fg} does not have MLR. A direct computation using the Neyman—Pearson 
lemma shows that an MP test of 0 = 0 against 0 = 6,, 0; > 0 depends on @; and hence 
cannot be MP for testing 6 = 0 against 0 = 02, 62 4 6;. Hence a UMP test of Ho against 
H does not exist. An LMP test of Hp against H, is of the form 


n 2x; 
_ 1 fq 
i=1 L 


0 otherwise, 


0(x) 


where k is chosen so that the size of yo is a. For small n it is hard to compute k but for 


large n it is easy to compute & using the central limit theorem. Indeed { i ae } are iid RVs 


with mean 0 and finite variance (= 3/8) so that k = zq\/n/2 will give an (approximate) 
level a test for large n. 

The test ¢ is good at detecting small departures from 6 < 0 but it is quite unsatisfactory 
in detecting values of @ away from 0. In fact, for a < 1/2, 8,,(0) + 0 as 6 — oo. 

This procedure for finding locally best tests has applications in nonparametric statistics. 
We refer the reader to Randles and Wolfe [85, section 9.1] for details. 
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PROBLEMS 9.6 
1. Let X1,X2,...,X, be iid C(1,0) RVs. Show that Eo(1 + X7)~* = (1/m)B(k + 
1/2,1/2). Hence or otherwise show that Eo os ni | = var (<2) = =1/8. 


2. Let X|,X2,...,X, be arandom sample from logistic PDF 


1 ex? 


fo) = ayesha] ~ (+e 


Show that the LMP test of Ho:0=0 against H;:0>0 rejects Ho if 5>\_, tanh 
(#) >k. 
3. Let X,,X2,...,X;, be iid RVs with common Laplace PDF 


fo(x) = (1/2) exp{—|x— 8] }. 


For n > 2 show that UMP size a (0 < a < 1) test of Hy : 6 < 0 against H; :0 >0 
does not exist. Find the form of the LMP test. 


10 


SOME FURTHER RESULTS ON 
HYPOTHESES TESTING 


10.1 INTRODUCTION 


In this chapter we study some commonly used procedures in the theory of testing of 
hypotheses. In Section 10.2 we describe the classical procedure for constructing tests 
based on likelihood ratios. This method is sufficiently general to apply to multi-parameter 
problems and is specially useful in the presence of nuisance parameters. These are 
unknown parameters in the model which are of no inferential interest. Most of the normal 
theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 can be derived 
by using methods of Section 10.2. In Sections 10.3 to 10.5 we list some commonly 
used normal theory-based tests. In Section 10.3 we also deal with goodness-of-fit tests. 
In Section 10.6 we look at the hypothesis testing problem from a decision-theoretic 
viewpoint and describe Bayes and minimax tests. 


10.2) GENERALIZED LIKELIHOOD RATIO TESTS 


In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis testing. 
It was suggested that we restrict attention to smaller classes of tests and seek UMP tests in 
these subclasses or, alternatively, seek tests which are optimal against local alternatives. 
Unfortunately, some of the reductions suggested in Chapter 9, such as invariance, do not 
apply to all families of distributions. 

In this section we consider a classical procedure for constructing tests that has 
some intuitive appeal and that frequently, though not necessarily, leads to optimal 
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tests. Also, the procedure leads to tests that have some desirable large-sample 
properties. 

Recall that for testing Hp : X ~ fo against H; : X ~ f,, Neyman—Pearson MP test is 
based on the ratio f| (x) /fo(x). If we interpret the numerator as the best possible explana- 
tion of x under H;, and the denominator as the best possible explanation of X under Ho, 
then it is reasonable to consider the ratio 


r(x) = supgco, L(9;x) = supgce, fo(x) 
supgco, L(9;x) supgce, fo(x) 


as a test statistic for testing Hp : 9 € Oo against H, : 8 € ©;. Here L(6;x) is the likelihood 
function of x. Note that for each x for which the MLEs of 0 under ©, and Qo exist the 
ratio is well defined and free of @ and can be used as a test statistic. Clearly we should 
reject Ho if r(x) >. 

The statistic r is hard to compute; only one of the two supremas in the ratio may be 
attained. 

Let 0 € O C ® be a vector of parameters, and let X be a random vector with PDF 
(PMF) fg. Consider the problem of testing the null hypothesis Hp: X ~ fg, 8 € Oo against 
the alternative H;: X ~ fg, 9 € O;. 


Definition 1. For testing Ho against Hj, a test of the form, reject Ho if and only if 
A(x) < c, where c is some constant, and 

sup fo(x1,x2, Bo ee, 
_ 9EOo 


A(x) = ; 
(x) sup fo(*1,%2,---,Xn) 
dco 


is called a generalized likelihood ratio (GLR) test. 


We leave the reader to show that the statistics A(X) and r(X) lead to the same criterion 
for rejecting Ho. 

The numerator of the likelihood ratio \ is the best explanation of X (in the sense of 
maximum likelihood) that the null hypothesis Hp can provide, and the denominator is the 
best possible explanation of X. Ho is rejected if there is a much better explanation of X 
than the best one provided by A. 

It is clear that 0 < \ < 1. The constant c is determined from the size restriction 

sup Pe{\(X) <ch =a. 

OE 
If the distribution of A is continuous (that is, the DF is absolutely continuous), any size a 
is attainable. If, however, A(X) is a discrete RV, it may not be possible to find a likelihood 
ratio test whose size exactly equals a. This problem arises because of the nonrandomized 
nature of the likelihood ratio test and can be handled by randomization. The following 
result holds. 


Theorem 1. If for given a, 0 < a < 1, nonrandomized Neyman—Pearson and likelihood 
ratio tests of a simple hypothesis against a simple alternative exist, they are equivalent. 


Proof. The proof is left as an exercise. 
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Theorem 2. For testing 9 € Og against 8 € 0, the likelihood ratio test is a function of 
every sufficient statistic for 0. 


Theorem 2 follows from the factorization theorem for sufficient statistics. 


Example 1. Let X ~ b(n,p), and we seek a level a likelihood ratio test of Ho: p < po 
against H,: p > po: 


n wy n= 
sup ( p (lp) 


A(x) = PSPo x 

n x n—x 
sup (")rap) 
0<p<1 \¥ 


Now 


veyrtt—or=(J (0-3) 


O<p<1 n n 


The function p*(1 — p)"~~ first increases, then achieves its maximum at p = x/n, and 
finally decreases, so that 


- x 
Po(1 — po)” ifpo<—, 
sup p*(1 =p)" = x n—x 
PSPo (=) (1-=) © ee < po. 
n n n 
It follows that 
Po — po)” * 
: if po <5 
Mx) =) G/T Gly 
I if ~ < pp. 
n 


Note that A(x) < 1 for npo < x and A(x) = 1 if x < npo, and it follows that A(x) is a 
decreasing function of x. Thus (x) < c if and only if x > c’, and the GLR test rejects Ho 
ifx>c’. 

The GLR test is of the type obtained in Section 9.4 for families with an MLR except 
for the boundary (x) = c. In other words, if the size of the test happens to be exactly a, 
the likelihood ratio test is a UMP level a test. Since X is a discrete RV, however, to obtain 
size a may not be possible. We have 


e=supP,{xX >cl=P, 1x > eh. 
PSPo 


If such ac’ does not exist, we choose an integer c’ such that 


Py{X>c}<a and P,{>c’—1} >a. 


The situation in Example | is not unique. For one-parameter exponential family it can 
be shown (Birkes [7]) that a GLR test of Hp : 8 < 09 against H, : 8 > 0) is UMP of its 
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size. The result holds also for the dual Hh : @ > @p and, in fact, for a much wider class of 


one-parameter family of distributions. 


The GLR test is specially useful when @ is a multiparameter and we wish to test 
hypothesis concerning one of the parameters. The remaining parameters act as nuisance 


parameters. 


Example 2. Consider the problem of testing w = fo against 44 A Yo in sampling from 
N(u,07), where both pz and o? are unknown. In this case Qo = {(40,07): 07 > O} and 


@ = {(p,07): —0o < p< 00, 0 > O}. We write 0 = (1,07): 


pee — Luo)” 


1 
sup fe(x) = sup | ae exp 


AE Oo o?>0 20? 
2 
=f5, ean 
where 64 is the MLE, 63 = (1/n) ~7_, (x; — uo)”. Thus 
1 —n 
supfo(x) = nj2” es 


80) (Qm/ny"/2 £92" (x; — pro)? } 


The MLE of 6 = (11,07) when both yz and o7 are unknown is (30) xi/n, 0} (x; —X)?/n). 


It follows that 


supfa(x) =sup w= exp Ei) \ 


1 —n/2- 


= € 


(2m nynr2 S403} 


Thus 


A(x) = { iene" 


l n/2 
TE We Smee arf 
The GLR test rejects Ho if 


A(x) <¢, 


and since \(x) is a decreasing function of n(%— uo)? / >>} n(x; —X)*, we reject Ho if 


that is, if 


v1 
Se 5 


ee 
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where s? = (n—1)~!371(x; — x). The statistic 


S 
has a f-distribution with n — 1 df. Under Ho: 4 = po, t(X) has a central t(n — 1) dis- 
tribution, but under Hy: 2 ~ fo, t(X) has a noncentral f-distribution with n— 1 d.f. and 
noncentrality parameter 6 = (js — o)/o. We choose c” = t,_1,4/2 in accordance with 
the distribution of ¢(X) under Ho. Note that the two-sided t-test obtained here is UMP 
unbiased. Similarly one can obtain one-sided t-tests also as likelihood ratio tests. 


The computations in Example 2 could be slightly simplified by using Theorem 2. 
Indeed T(X) = (X,S7) is a minimal sufficient statistic for @ and since X and S* are indepen- 
dent the likelihood is the product of the PDFs of X and S?. We note that X ~ N(y,07/n) 


and S$? ~ eae ,- We leave it to the reader to carry out the details. 


Example 3. Let X,,X2,...,Xm and Y),Y2,...,¥, be independent random samples from 


N (1,07) and N(ju2,0%), respectively. We wish to test the null hypothesis Ho: 07 = 04 


against Hy: 07 £03. Here 
© = {(p1,07, 12,05): — 00 <p < 0,07 > 0,i= 1,2} 
and 
Oo = {(H1,07) 2,03): — 00 < pj < 00,1 =1,2,07 = 05 > O}. 


Let 0 = (111,07, 42,03). Then the joint PDF is 


1 1 m 1 n . 
fo(x,y) = Onymnige 304 g3 Da H1) a3 2 H2) \. 


Also, 


m+n m n 
log fo(x,y) = 5 log 27 5 logo; 5 log a5 


Differentiating with respect to ju; and j2, we obtain the MLEs 
fy=xX and = fy =y. 


Differentiating with respect to oj and 03, we obtain the MLEs 


m n 


So(Qi-x) and a=" i-3). 
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If, however, c= = = =o, the MLE of o? is 


2 LiF + DOT)! 


m+n 
Thus 
e7 (mtn) /2 
Pia i [2m /(m-+n)] 2 £52", — 32+ "(yi spy 
and 
e7 (mtn) /2 
sey oe imynl(On/nnl "yo py 
so that 
m+n m+n =e a a 
Now 


{Eri=yy" (eioiawy" 
{IP@— P+ Ci Oi- or 


1 
(1+ Pai - 32/11 92}"" 1+ DV H)2/ Dr 32} 
Writing 


we have 


{1+ [m= 1)/(= DFA + [= D/(m— DAY?” 


We leave the reader to check that A(x, y) < c is equivalent to f <c; or f > c2. (Take 
logarithms and use properties of convex functions. Alternatively, differentiate log X.) 
Under Ho, the statistic 
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has an F(m— 1,n— 1) distribution, so that c,, cz can be selected. It is usual to take 


PUF Sci} =P{F >} = 5. 
Under A), (03 /o7)F has an F(m—1,n—1) distribution. 


In Example 3 we can obtain the same GLR test by focusing attention on the joint suf- 
ficient statistic (X,Y, 5%,S}) where S% and S} are sample variances of the X’s and the Y’s, 
respectively. In order to write down the likelihood function we note that X,Y, s. es are 
independent RVs. The distributions X and S} are the same as in Example 2 except that m 
is the sample size. Distributions of Y and S}, require appropriate modifications. We leave 
the reader to carry out the details. It turns out that the GLR test coincides with the UMP 
unbiased test in this case. 

In certain situations the GLR test does not perform well. We reproduce here an example 
due to Stein and Rubin. 


Example 4. Let X be a discrete RV with PMF 
a 


7 1h S22, 
Pi=o{X =x} = ¢ 1-20 oat, 
2 
a if x =0, 
under the null hypothesis Hp: p = 0, and 
pe if x = —2, 


ea i 
— (5-2) ifx=+1, 
P,{X=x}=4 1-2 \2 


(1—p)c if x = 2, 
under the alternative H,: p € (0,1), where a and c are constants with 
1 
O0<a<- and _“_<e<a. 
2 a 


To test the simple null hypothesis against the composite alternative at the level of 
significance a, let us compute the likelihood ratio 1. We have 


\(2) = Po{X=2} _a/2_ a 
“sup Pak = 2) @ ~ Qe 
O<p<l 


since a/2 < c. Similarly \(—2) = a/(2c). Also 


470 SOME FURTHER RESULTS ON HYPOTHESES TESTING 


and 


The GLR test rejects Ho if A(x) < k, where k is to be determined so that the level is a. We 
see that 


Po {209 < =} = Po{X = £2} =a, 


provided that a/2c < [(1—a)/(1—c)]. But a/(2—a) <c < a implies a < 2c —ca, so 
that a —ca < 2c —2ca, or a(1—c) < 2c(1— a), as required. Thus the GLR size a test is 
to reject Hp if X = +2. The power of the GLR test is 


1 
Pf X)< =} =P,{X=+2}=pe+(l—p)e=e<a 


for all p € (0,1). The test is not unbiased and is even worse than the trivial test p(x) = a. 
Another test that is better than the trivial test is to reject Hy whenever x = 0 (this is 
opposite to what the likelihood ratio test says). Then 


Po{X = 0} =a, 


1 
Pp{X =0} =a — >a (since c < a), 
—a 
for all p € (0,1), and the test is unbiased. 


We will use the generalized likelihood ratio procedure quite frequently hereafter 
because of its simplicity and wide applicability. The exact distribution of the test statistic 
under Hp is generally difficult to obtain (despite what we saw in Examples | to 3 above) 
and evaluation of power function is also not possible in many problems. Recall, however, 
that under certain conditions the asymptotic distribution of the MLE is normal. This result 
can be used to prove the following large-sample property of the GLR under Ho, which 
solves the problem of computation of the cut-off point c at least when the sample size is 
large. 


Theorem 3. Under some regularity conditions on fg (x), the random variable —2 log \(X) 
under Ho is asymptotically distributed as a chi-square RV with degrees of freedom equal to 
the difference between the number of independent parameters in O and the number in Oo. 


We will not prove this result here; the reader is referred to Wilks [118, p. 419]. The 
regularity conditions are essentially the ones associated with Theorem 8.7.4. In Example 2 
the number of parameters unspecified under Ho is one (namely, 07), and under H; two 
parameters are unspecified (j. and 07), so that the asymptotic chi-square distribution will 
have | d.f. Similarly, in Example 3, the d.f.=4—3=1. 


Example 5. In Example 2 we showed that, in sampling from a normal population with 
unknown mean yz and unknown variance o?, the likelihood ratio for testing Ho: 1. = [uo 
against H,: 6 ~ po is 
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0)= [EF 


Thus 


X¥_),)2 
—2log A(X) = mog {1+ ns hs}. 


Under Ho, \/n(X — o)/o ~ N(0,1) and S>}(X; — X)?/o? ~ x?(n — 1). Also 


ya, (X%) — X)2/[(n— 1)o?] ++ 1. It follows that if Z ~ N(0, 1) then —2log A(X) has the 
Viz 
n—1 


same limiting distribution as nlog 4 1+ \. Moreover, 


and since logarithm is a continuous function we see that 
Vi 
nlog {1 + 7 es 
n= 


Thus —2 log A(X) + Y, where Y ~ x?(1). This result is consistent with Theorem 3. 


PROBLEMS 10.2 


1. Prove Theorems | and 2. 

2. A random sample of size n is taken from PMF P(X; = x;) = p;, j = 1,2,3,4, 0 < 
p<, yD = |. Find the form of the GLR test of Ho : p) = p2 = p3 = pa = 1/4 
against H) : p) = p2 = p/2, p3 = pa = (1—p)/2,0<p<. 

3. Find the GLR test of Ho: p = po against H;: p 4 po, based on a sample of size 1 
from b(n,p). 

4. Let X,,X2,...,X, be a sample from N(j1,07), where both jz and o? are unknown. 
Find the GLR test of Ho: o = 09 against H;: 0 # oo. 

5. Let X1,Xo,...,X% be a sample from PMF 


1 
PutX =i} = 5 j=1,2,...,N, N > 1 is an integer. 


(a) Find the GLR test of Hyp: N < No against H;: N > No. 
(b) Find the GLR test of Hp: N = No against H,: N 4 No. 
6. For a sample of size 1 from PDF 
2 
folx) = G(O-2), 0<x<8, 


find the GLR test of 0 = 0 against 0 F Oo. 
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7. Let X,,X>,...,X, be a sample from G(1, 3): 

(a) Find the GLR test of 8 = 6 against B 4 {o. 
(b) Find the GLR test of 6 < {po against 6 > Go. 

8. Let (X1, Yi), (Xo, Y2),.--, (Xn, Yn) be a random sample from a bivariate normal pop- 
ulation with EX; = p1,, EY; = 12, var(X;) = 0”, var(Y;) = 07, and cov(X;, Y;) = pa’. 
Show that the likelihood ratio test of the null hypothesis Hp: p = 0 against H,: p £0 
reduces to rejecting Hp if |R| > c, where R = 2S), /(S} +53), Si1, S7, and S5 being 
the sample covariance and the sample variances, respectively. (For the PDF of the 
test statistic R, see Problem 7.7.1.) 

9. Let X1,Xo,...,Xm be iid G(1,6) RVs and let Y;, Yo,...,Y, be iid G(1, 2) RVs, where 
@ and yu are unknown positive real numbers. Assume that the X’s and the Y’s are 
independent. Develop an a-level GLR test for testing Ho : 0 = ys against H, : 0 A yu. 

10. A die is tossed 60 times in order to test Ho : P{j} = 1/6, 7 = 1,2,...,6 (die is fair) 
against H, : P{2} = P{4} = P{6} = 2/9, P{1} = P{3} = P{5} = 1/9. Find the 
GLR test. 

11. Let X,,X2,...,X;, be iid with common PDF f(x) = exp{—(x—@)}, x > 0, and =0 
otherwise. Find the level a GLR test for testing Hp : 8 < 69 against H; : 8 > 6p. 

12. Let X,,X2,...,X, be iid RVs with common Pareto PDF fg(x) = 0/x? for x > 0, 
and = 0 elsewhere. Show that the family of joint PDFs has MLR in X(1) and find a 
size a test of Ho : 6 = Oo against H, : 8 > 09. Show that the GLR test coincides with 
the UMP test. 


10.3. CHI-SQUARE TESTS 


In this section we consider a variety of tests where the test statistic has an exact or a limit- 
ing chi-square distribution. Chi-square tests are also used for testing some nonparametric 
hypotheses and will be taken up again in Chapter 13. 

We begin with tests concerning variances in sampling from a normal population. Let 
X1,X2,...,X, be iid N(1,07) RVs where co” is unknown. We wish to test a hypothesis 
of the type > Ge, e< as or 0? = aa. where op is some given positive number. We 
summarize the tests in the following table. 


Reject Ho at level a if 


Ao A jt Known jt Unknown 
n 2 2 2 2 a 2, 
I 02009 o<o% 1 Gi- #) < Xn a7 es n—[Xr-li-a 
n 2 a. 6 2 o% 2 
IL ao<oa9 o> pee Cae!) at OL so > 7 xnhe 


n 22 2 2< 70 2 
DiGi - 1) S Xn,1—-a/270 ss no [rrr-hl-a/2 


I. c=o09 0 #00 or or 


2 2 2 2 20259 
Di #) = Xn,ox/2% ies _ [Xn-1,0/2 
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Remark 1. All these tests can be derived by the standard likelihood ratio procedure. If 1 
is unknown, tests I and IJ are UMP unbiased (and UMP invariant). If 4 is known, tests I 
and II are UMP (see Example 9.4.5). For tests III we have chosen constants cj, cz so that 
each tail has probability a/2. This is the customary procedure, even though it destroys the 
unbiasedness property of the tests, at least for small samples. 


Example 1. A manufacturer claims that the lifetime of a certain brand of batteries pro- 
duced by his factory has a variance of 5000 (hours)*. A sample of size 26 has a variance 
of 7200 (hours)*. Assuming that it is reasonable to treat these data as a random sample 
from a normal population, let us test the manufacturer’s claim at the a = 0.02 level. Here 
Ho: 07 = 5000 is to be tested against H,: 07 4 5000. We reject Hp if either 


2 OG 2 2 OG 2 
s° = 7200 < no [rrbl-a/2 or s > no [rr he/2" 
We have 
2 
% 2 _ 5000 = 
an [Xn hl-a/2 = 35" x 11.524 = 2304.8 
2 
A 2 _ 5000 = 
4 [Xn-1,0/2 =e x 44.314 = 8862.8. 
Since s* is neither < 2304.8 nor > 8862.8, we cannot reject the manufacturer’s claim at 


level 0.02. 


A test based on a chi-square statistic is also used for testing the equality of several 
proportions. Let X1,X2,...,X; be independent RVs with X; ~ b(nj,p;), i = 1,2,...,k, 
k>2. 


Theorem 1. The RV LAG —nipi)/\/nipi(1 — pi)? converges in distribution to the 
x7 (k) RV as nj ,nz,...,M% — 00. 


Proof. The proof is left as an exercise. 


If n1,72,...,n, are large, we can use Theorem | to test Ho: pj = p2 =--- =pPr=pP 
against all alternatives. If p is known, we compute 


k 2 
y= Xj — Nip 
1 nip(1 _ P) 
and if y > x7 ,, we reject Ho. In practice p will be unknown. Let p = (p1,p2,.-., px). Then 
the likelihood function is 
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so that 


log L(p;x )= Se (" ‘ + Soman + Dn x;) log(1 — pj). 
The MLE p of p under H is therefore given by 


Dis _ Lili—¥) _ 


P 1—p 
that is, 
mn XX. XK 
p= ——————_.. 
Ny +Ng+++++Ng 


Under certain regularity assumptions (see Cramér [17, pp. 426—427]) it can be shown that 
the statistic 


k 


(=) Boner Sm) (1) 


1 nip(1—p) 


is asymptotically y7(k — 1). Thus the test rejects Hy: p) = p2 =--: = px =p, p unknown, 
at level a ify) > XZ_1.4: 

It should be remembered that the tests based on Theorem | are all large-sample tests and 
hence not exact, in contrast to the tests concerning the variance discussed above, which 
are all exact tests. In the case k = 1, UMP tests of p > po and p < po exist and can be 
obtained by the MLR method described in Section 9.4. For testing p = po, the usual test 
is UMP unbiased. 

In the case k = 2, if n; and nz are large, a test based on the normal distribution can be 
used instead of Theorem 1. In this case the statistic 


X,/n — X2/n2 


~ /p—p)/m + 1/m)’ 


where p = (X; + X2)/(n; +nz), is asymptotically N(0,1) under Ho: p; = p2 = p. If p is 
known, one uses p instead of p. It is not too difficult to show that Z? is equal to Y;, so that 
the two tests are equivalent. 

For small samples the so-called Fisher—Irwin test is commonly used and is based on 
the conditional distribution of X; given T = X; +X. Let p = [p, (1 —p2)]/[p21 — p1)]. 
Then 


(2) 


35 (7) (cease 
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where 
a(n, ,n2) = (1—pi)" (1 — pa)” {p2/(1 — pa) }- 
It follows that 
ee a ee 2 al 
a(n,nz) 25 ("') oe 
__ ye 
ae ie ("A 


On the boundary of any of the hypotheses p; = po, pi < po, or p1 > p2 we note that p = 1 
so that 


P{X = x|X| +X) = th = 


ee 


lorie a 


which is a hypergeometric distribution. For testing Ho : p1 < p2 this conditional test rejects 
if X, < k(t), where k(t) is the largest integer for which P{X, < k(T)|T = t} < a. Obvious 
modifications yield critical regions for testing p; = p2, and p, > p2 against corresponding 
alternatives. 

In applications a wide variety of problems can be reduced to the multinomial distribu- 
tion model. We therefore consider the problem of testing the parameters of a multinomial 


P{X =1x|X| +X> 4 t} = 


distribution. Let (X),X2,...,X,—-1) be a sample from a multinomial distribution with 
parameters n, p1,P2,.--,Px—1, and let us write X, =n—X, —---—X,_1, and p,p = 1 — 
Pi—:+:—px—1- The difference between the model of Theorem | and the multinomial model 


is the independence of the X;’s. 


Theorem 2. Let (X;,X2,...,X;,_1) be a multinomial RV with parameters n, p),p2,..., 
Px—1. Then the RV 


y= So { howi?) 3) 


i=1 vee 
is asymptotically distributed as a y7(k — 1) RV (as n > 00). 


Proof: For the general proof we refer the reader to Cramér [17, pp. 417-419] or 
Ferguson [29, p. 61]. We will consider here the k = 2 case to make the result a little more 
plausible. We have 


(Xi—npi)? | (X2—np2)? (Xi =mpi)? , n=Xi =n = pr)? 


Os:= + = 
mpi npr mp n(1—p1) 
1 1 
= (X;—np,)* + | 
ay EF n(1—p1) 
(X1 —npi)? 


~ npi(1—pi)’ 
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It follows from Theorem 1 that U2 + Y as n + 00, where Y ~ x7(1). 


To use Theorem 2 to test Ho: p1 =p},---,Px =P, we need only to compute the quantity 


from the sample; if n is large, we reject Ho if u > XE1.0° 
Example 2. A die is rolled 120 times with the following results: 


1 2 3 4 5 6 
Frequency: 20 30 20 25 15 10 


Let us test the hypothesis that the die is fair at level a = 0.05. The null hypothesis is 
Ho: pi= i, i= 1,2,...,6, where p; is the probability that the face value is 7, 1 <i < 6. By 
Theorem 2 we reject Ho if 


6 1\72 
[x; — 120(2)]? 
iD 120(2 > X5,0.05 
We have 
107 2 F  i0 


Since V5,0.05 = 11.07, we reject Ho. Note that, if we choose a = 0.025, then v5 9.025 = 12.8, 
and we cannot reject at this level. 


Theorem 2 has much wider applicability, and we will later study its application to 
contingency tables. Here we consider the application of Theorem 2 to testing the null 
hypothesis that the DF of an RV X has a specified form. 


Theorem 3. Let X|,X2,...,X, be a random sample on X. Also, let Hp: X ~ F, where 
the functional form of the DF F is known completely. Consider a collection of disjoint 
Borel sets A,,A2,...,A, that form a partition of the real line. Let P{X € A;} = pj, i= 
1,2,...,k, and assume that p; > 0 for each i. Let Y¥; = number of X;’s in Aj, j = 1,2,...,k, 
i= 1,2,...,n. Then the joint distribution of (Y,, Y2,..., ¥;—1) is multinomial with param- 
eters 1, P1,P2,---,Pk—1- Clearly, Y, =n—Y, —-+-— Yg_; and pp = 1 —pi — +++ — pei. 


The proof of Theorem 3 is obvious. One frequently selects A,,A2,...,A, as disjoint 
intervals. Theorem 3 is especially useful when one or more of the parameters associated 
with the DF F are unknown. In that case the following result is useful. 


Theorem 4. Let Hy: X ~ Fo, where 0 = (6),62,...,0,) is unknown. Let X;,X2,...,Xn 
be independent observations on X, and suppose that the MLEs of 6), 62,...,0, exist and 
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are, respectively, 6, , b>, wat 6, Let A,,A2,...,A, be a collection of disjoint Borel sets that 
cover the real line, and let 


pi = Pa{X C Aj} > 0 FS 1g 2 oie Ky 


where 0 = (61,...,6,), and Pg is the probability distribution associated with Fg. Let 
Y,,Yo,...,¥; be the RVs, defined as follows: Y; = number of X1,X2,...,X, in Aj, i= 
V2 eas oki 

Then the RV 


3 {Cin 


n=1 ne 
is asymptotically distributed as a y?(k —r—1) RV (as n > 00). 
The proof of Theorem 4 and some regularity conditions required on Fg are given in 


Rao [88, pp. 391-392]. 
To test Hy: X ~ F, where F is completely specified, we reject Ho if 


provided that n is sufficiently large. If the null hypothesis is Hj: X ~ Fg, where Fe is 
known except for the parameter 0, we use Theorem 4 and reject Ho if 


where r is the number of parameters estimated. 


Example 3. The following data were obtained from a table of random numbers of normal 
distribution with mean 0 and variance 1. 


0.464 0.137 2.455 —0.323 —0.068 
0.906 —0.513 —0.525 0.595 0.881 
—0.482 1.678 —0.057 —1.229 —0.486 
—1.787 —0.261 1.237 1.046 —0.508 


We want to test the null hypothesis that the DF F from which the data came is normal 
with mean 0 and variance 1. Here F is completely specified. Let us choose three intervals 
(—oo, —0.5], (—0.5,0.5], and (0.5,00). We see that Y; = 5, Y2 = 8, and Y3 =7. 

Also, if Z is N(0, 1), then p; = 0.3085, p2 = 0.3830, and p3 = 0.3085. Thus 


Eee) 


i=1 
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(5-20 x 0.3085)? (8-20 0.383)? (7—20 x 0.3085)" 
— T 


6.17 7.66 7 6.17 


<i. 


Also, x3.0.05 = 5.99, so we cannot reject Hp at level 0.05. 


Example 4. In a 72-hour period on a long holiday weekend there was a total of 306 fatal 
automobile accidents. The data are as follows: 


Number of Fatal Accidents 


per Hour Numbers of Hours 
O or | 4 

2 10 

3 15 

4 12 

5 12 

6 

7 

8 or more q 


Let us test the hypothesis that the number of accidents per hour is a Poisson RV. 
Since the mean of the Poisson RV is not given, we estimate it by 


Let us now estimate p; = P;{X =i}, i=0,1,2,..., po = e~> = 0.0143. Note that 
P{X=xt+]} X 
P {xX =x} x+1’ 
so that pj; = [A/(i+1)]p;. Thus 


pi = 0.0606, p> = 0.1288, p3 = 0.1825, py =0.1939, 
ps = 0.1648, fs = 0.1167, p7 =0.0709, ps = 1 — 0.9325 = 0.0675. 


The observed and expected frequencies are as follows: 


Oor 1 2, 3 4 5 6 7  8ormore 


Observed Frequency, 0; 4 10 15 12 12 6 6 7 
Expected Frequency 5.38 9.28 13.14 13.96 11.87 841 5.10 4.86 
= 72D; = ej 
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Thus 


Since we estimated one parameter, the number of degrees of freedom is k—r— 1 = 8 — 
1 —1=6. From Table ST3, X6.0.05 = 12.6, and since 2.74 < 12.6, we cannot reject the null 
hypothesis. 


Remark 2. Any application of Theorem 3 or 4 requires that we choose sets A;,A2,...,Ax, 
and frequently these are chosen to be disjoint intervals. As a rule of thumb, we choose the 
length of each interval in such a way that the probability P{X € A;} under Ho is approxi- 
mately 1/k. Moreover, it is desirable to have n/k > 5 or, rather, e; > 5 for each i. If any of 
the e;’s is < 5, the corresponding interval is pooled with one or more adjoining intervals 
to make the cell frequency at least 5. The number of degrees of freedom, if any pooling 
is done, is the number of classes after pooling, minus |, minus the number of parameters 
estimated. 


Finally, we consider a test of homogeneity of several multinomial distributions. Sup- 
pose we have c samples of sizes n;,72,...,%- from c multinomial distributions. Let the 
associated probabilities with the jth population be (pj;,p2;,...,Prj), where )>;_, pj = 1, 
j=1,2,...,c. Given observations Nj, i= 1,2,...,7,7=1,2,.. se with Nj = Np J = 
Five. eee wish to test Ho : py = pi, forj = 1,2,...,c,i= 1, 2. ,r—1. The casec = 1 
is covered by Theorem 2. By Theorem 2 for each j 


yn Pi) 
U, = ij Jd 
3 {“ NPi -| 


has a limiting or distribution. Since samples are independent, the statistic 


(4 


i ‘Ss 3 (Ni — nypi)” 


np; 
j=l i=l iPi 


has a limiting X2(r-1) distribution. If p;’s are unknown we use the MLEs 


c 
YN, 
‘ =1 Vij 
a f ’ = 1,2, »r— 1 
a= nj 
for p; and we see that the statistic 
c r 
- yp See 
j=l i=1 niPi 


has a chi-square distribution with c(r— 1) — (r— 1) = (c— 1)(r—1) d.f. We reject Hp at 
(approximate) level a is V;- > eed jc" 
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Example 5. A market analyst believes that there is no difference in preferences of televi- 
sion viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and Cincinnati. 
In order to test this belief, independent random samples of 150, 200, 250, and 200 per- 
sons were selected from the four cities and asked, ““What type of program do you prefer 
most: Mystery, Soap, Comedy, or News Documentary?” The following responses were 
recorded. 


City 
Program Type Toledo Columbus Cleveland Cincinnati 
Mystery 50 70 85 60 
Soap 45 50 58 40 
Comedy 35 50 ce 67 
News 20 30 35 33 
Sample Size 150 200 250 200 


Under the null hypothesis that the proportions of viewers who prefer the four types of 
programs are the same in each city, the maximum likelihood estimates of p;, i= 1,2,3,4 
are given by 


, — 50+704+85+60 _ 265 _ 44, . _ 35+50+72+67 _ 224 
P= 750+200+250+200 300. °° 7 800 800.” 
. 45450458440 193 . 20430435433 118 

a 800 300 0-74: —— 800 =G0g 


Here p; = proportion of people who prefer mystery, and so on. The following table 
gives the expected frequencies under Ho. 


Expected Number of Responses Under Ho 


Program 


Type Toledo Columbus Cleveland Cincinnati 


Mystery 150x0.33=49.5 200x0.33=66 250x0.33=82.5 2000.33 = 66 
Soap 150 x0.24=36 200x0.24=48 250x0.24=60 200x0.24=48 
Comedy 150x0.28=42 200x0.28=56 250x0.28=70 2000.28 =56 
News 150 x 0.15 = 22.5 200x0.15=30 2500.15 =37.5 2000.15 = 30 


Sample 150 200 250 200 


Size 
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It follows that 
50—49.5)* (45—36)" ‘ (35-42)? (20—22.5)? 


ir OE, 36 a | 225 
(70-66)? (50-48)? (50-56)? (30-30)? 
6  ©§©— 48 56 30 
(85—82.5)2 (58-60) (72-70)? (35—37.5)? 
82.5 | 60 | 70 | 37.5 
(60-66)? , (40-48)? _ (67-56)? (33-30) 
v6. as 5. | «30 


= 9.37. 


Since c = 4 and r = 4, the number of degrees of freedom is (4— 1)(4— 1) = 9 and we 
note that under Ho 


0.30 < P(Us4 > 9.37) < 0.50. 


With such a large P-value we can hardly reject Hp. The data do not offer any evidence to 
conclude that the proportions in the four cities are different. 


PROBLEMS 10.3 


1. The standard deviation of capacity for batteries of a standard type is known to be 1.66 
ampere-hours. The following capacities (ampere-hours) were recorded for 10 bat- 
teries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142, 136. Does the 
new battery differ from the standard type with respect to variability of capacity 

(Natrella [75, p. 4-1])? 

2. A manufacturer recorded the cut-off bias (volts) of a sample of 10 tubes as follows: 
12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability of cut-off 
bias for tubes of a standard type as measured by the standard deviation is 0.208 
volts. Is the variability of the new tube, with respect to cut-off bias less than that of 
the standard type (Natrella [75, p. 4-5])? 

3. Approximately equal numbers of four different types of meters are in service and 
all types are believed to be equally likely to break down. The actual numbers of 
breakdowns reported are as follows: 


Type of Meter | 1 2 3 4 
Number of Breakdowns Reported | 30 40 33 47 


Is there evidence to conclude that the chances of failure of the four types are not 
equal (Natrella [75, p. 9-4])? 

4. Every clinical thermometer is classified into one of four categories, A, B, C, D, on 
the basis of inspection and test. From past experience it is known that thermometers 
produced by a certain manufacturer are distributed among the four categories in the 
following proportions: 
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Category | A B c D 
Proportion | 0.87 0.09 0.03. 0.01 


A new lot of 1336 thermometers is submitted by the manufacturer for inspection and 
test, and the following distribution into the four categories results: 


Category | A B Cc D 
Number of Thermometers Reported | 1188 91 47 10 


Does this new lot of thermometers differ from the previous experience with regard 
to proportion of thermometers in each category (Natrella [75, p. 9-2])? 


5. A computer program is written to generate random numbers, X, uniformly in the 
interval 0 < X < 10. From 250 consecutive values the following data are obtained: 


X-value | 0-199 23.99 45.99 6-7.99 _ 8-9.99 
Frequency | 38 55 54 41 62 


Do these data offer any evidence that the program is not written properly? 

6. A machine working correctly cuts pieces of wire to a mean length of 10.5 cm with a 
standard deviation of 0.15 cm. Sixteen samples of wire were drawn at random from a 
production batch and measured with the following results (centimeters): 10.4, 10.6, 
10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 10.7, 10.3, 10.4, 10.5. Test 
the hypothesis that the machine is working correctly. 

7. An experiment consists in tossing a coin until the first head shows up. One hun- 
dred repetitions of this experiment are performed. The frequency distribution of the 
number of trials required for the first head is as follows: 


Number of trials | 1 2 3 4 5 or more 
Frequency | 40 32 15 7 6 


Can we conclude that the coin is fair? 
8. Fit a binomial distribution to the following data: 


x 0 1 2 3 4 
Frequency: 8 46 55 40 11 


9, Prove Theorem 1. 
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10. 


11. 


12. 


13. 


14. 


Three dice are rolled independently 360 times each with the following results. 


Face Value Die 1 Die 2 Die 3 


1 50 62 38 
2 48 a3 60 
3 69 61 64 
4 45 54 58 
5 71 78 73 
6 77 50 67 


Sample Size 360 360 360 


Are all the dice equally loaded? That is, test the hypothesis Ho : pa = pi2 = pas, 
i=1,2,...,6, where p;; is the probability of getting an i with die 1, and so on. 
Independent random samples of 250 Democrats, 150 Republicans, and 100 Indepen- 
dent voters were selected 1 week before a nonpartisan election for mayor of a large 
city. Their preference for candidates Albert, Basu, and Chatfield were recorded as 
follows. 


Party Affiliation 
Preference Democrat Republican Independent 
Albert 160 70 90 
Basu 32 45 25 
Chatfield 30 23 15 
Undecided 28 12 20 
Sample Size 250 150 150 


Are the proportions of voters in favor of Albert, Basu, and Chatfield the same within 
each political affiliation? 

Of 25 income tax returns audited in a small town, 10 were from low- and middle- 
income families and 15 from high-income families. Two of the low-income families 
and four of the high-income families were found to have underpaid their taxes. Are 
the two proportions of families who underpaid taxes the same? 

A candidate for a congressional seat checks her progress by taking a random sample 
of 20 voters each week. Last week, six reported to be in her favor. This week nine 
reported to be in her favor. Is there evidence to suggest that her campaign is working? 
Let {X11,X21,..-,X},---,{X1e,X2,---;Xre} be independent multinomial RVs 
with parameters (71,P11,P21,---;Pri);++->(McsPlc;P2c1+++;Pre) Tespectively. Let 
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X;. = yar. 6 and a n; =n. Show that the GLR test for testing Ho : pj = pj. 
for j= 1,2,...,c,i=1,2,...,r—1, where p;’s are unknown against all alternatives 
can be based on the statistic 


10.4 t-TESTS 
In this section we investigate one of the most frequently used types of tests in statistics, 


the tests based on a r-statistic. Let X,,X2,...,X, be a random sample from N (1,07), and, 
as usual, let us write 


X=n'S°X, 9 =(n-1) 1" 0(K- XY. 
1 1 


The tests for usual null hypotheses about the mean can be derived using the GLR method. 
In the following table we summarize the results. 


Reject Hp at Level a if 


Ho A, o* Known o* Unknown 


_ o 2 S 
I Spo p> po ae 42 bot plete 


= o _ S 
TT. pepo pw< po se ca S Mot plete 


oO S 


TW. w=po pA Mo R— Hol 2 Teza/2 P— Hol 2 Tetn—t,0/2 


Remark I. A test based on a t-statistic is called a t-test. The t-tests in I and II are called 
one-tailed tests; the t-test in III, a two-tailed test. 


Remark 2. If o? is known, tests I and II are UMP and test III is UMP unbiased. If o? is 
unknown, the f-tests are UMP unbiased and UMP invariant. 


Remark 3. If n is large we may use normal tables instead of f-tables. The assumption 
of normality may also be dropped because of the central limit theorem. For small sam- 
ples care is required in applying the proper test, since the tail probabilities under normal 
distribution and t-distribution differ significantly for small n (see Remark 6.4.2). 


Example I. Nine determinations of copper in a certain solution yielded a sample mean of 
8.3 percent with a standard deviation of 0.025 percent. Let ys be the mean of the population 
of such determinations. Let us test Hp: wp = 8.42 against MH): pw < 8.42 at level a = 0.05. 
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Here n = 9, ¥ = 8.3, s = 0.025, pig = 8.42, and t,_1 1a = —!8,0.05 = — 1.860. 
Thus 


K) 0.025 
—=th-1,1-a = 8.42 — ——1.86 = 8.4045. 
bo + wa 1,1 3 
We reject Hp since 8.3 < 8.4045. 


We next consider the two-sample case. Let X,,X2,...,Xm and Yj, Y2,...,Y, be inde- 
pendent random samples from N(j11,07) and N(2,03), respectively. Let us write 


and 


go _ (m= WS} + (n—)83 
P m+n—2 : 


he is sometimes called the pooled sample variance. The following table summarizes the 
two sample tests comparing /4; and ju: 


Ho A, Reject Ho at Level a if 
(6 = Known Constant) o7, 05 Known a}, 03 Unknown, o; = 02 
I pi-p2<0 fi-f,>d X-Y> X—-Y>O+tn4n-2,0 
2 2 
o oO 1 1 
b+294/ ++ ‘Sp\/—+— 
m on min 
I. pa-jo22O pi poa<d x-¥< X—-Y¥ <6 —tn4n—2,0 
2 2 
o oO 1 1 
6—Z44/ ++ ‘Sp\/—+- 
m n mi on 
TL. wi-—p2=0 =f —-po #0) |[¥-y—d| > f= ¥=0|> Gases 


G; Ge 1 1 
Za/2\) Sp{/—+— 
m n mn 


Remark 4. The case of most interest is that in which 6 = 0. If 07,03 are unknown and 


Ge = ae = 0”, o? unknown, then Se is an unbiased estimate of o”. In this case all the 
two-sample f-tests are UMP unbiased and UMP invariant. Before applying the f-test, one 
should first make sure that 0? = 05 = 0”, o? unknown. This means applying another test 


on the data. We will consider this test in the next section. 
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Remark 5. If m+n is large, we use normal tables; if both m and n are large, we can drop 
the assumption of normality, using the CLT. 


Remark 6. The problem of equality of means in sampling from several populations will 
be considered in Chapter 12. 


Remark 7. The two sample problem when o, 4 02, both unknown, is commonly referred 
to as Behrens—Fisher problem. The Welch approximate t-test of Ho : [41 = [2 is based on 
a random number of d.f. f given by 


where 


_ Si/m 


R= 
Soin 


and the t-statistic 


(X—Y) — (41 = b2) 
V/St/m+S3/n 
with f d.f. This approximation has been found to be quite good even for small samples. 


The formula for f generally leads to noninteger d.f. Linear interpolation in t-table can be 
used to obtain the required percentiles for f d.f. 


T= 


Example 2. The mean life of a sample of 9 light bulbs was observed to be 1309 hours with 
a standard deviation of 420 hours. A second sample of 16 bulbs chosen from a different 
batch showed a mean life of 1205 hours with a standard deviation of 390 hours. Let us 
test to see whether there is a significant difference between the means of the two batches, 
assuming that the population variances are the same (see also Example 10.5.1). 

Here Ho: 4) = pla, My: fy A plz, mM=9, n= 16, X% = 1309, 5; = 420, y = 1205, 52 = 390, 
and let us take a = 0.05. We have 


8(420)2 + 15(390)2 
a 2B 


so that 


/1 1 8(420)? + 15(390)? /1 1 
tn-+-n—2,0/2Sp ea ae pie somsy/ ew 3B Se) Vi + 6 345.44. 


Since |x — ¥| = |1309 — 1205] = 104 4 345.44, we cannot reject Ho at level a = 0.05. 


Quite frequently one samples from a bivariate normal population with means /1, [2, 
variances 07,03, and correlation coefficient p, the hypothesis of interest being 4) = pio. 
Let (X1, 1), (X2, ¥2),.--,(Xn, Y¥,) be a sample from a bivariate normal distribution with 


2942 
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parameters 11, [l2, 07, 03, and p. Then X; — Y; is N(j1 — 2,07), where 0? = of + 


03 —2po102. We can therefore treat D; = (X; — Y;), j= 1,2,...,, as a sample from a 
normal population. Let us write 
n n 7W\2 
= dj dj—d 
d=l* and pee Cet ) : 
n n—-1 


The following table summarizes the resulting tests: 


A Ay, 
dy = Known Constant Reject Ho at Level a if 
_ Sd 
I pi—p2 > dy [My — pla < do d <dy+—tn-1,1-a 
Vn 
Tl. pi—p2 <do Hi — b2 > do a dpe iain, 
Jn” 
= Sd 
Il. -—[n=d —pnA#d d—do| > —ty-1.a 
[1 — f2 = do Mi — fa # do | 12 iex/t 


Remark 8. The case of most importance is that in which dp = 0. All the f-tests, based 
on D,’s, ace UMP unbiased and UMP invariant. If o is known, one can base the test on a 
standardized normal RV, but in practice such an assumption is quite unrealistic. If 7 is large 
one can replace t-values by the corresponding critical values under the normal distribution. 


Remark 9. Clearly, itis not necessary to assume that (X1,Y\),..., (Xn, Yn) isa sample from 
a bivariate normal population. It suffices to assume that the differences D; form a sample 
from a normal population. 


Example 3. Nine adults agreed to test the efficacy of a new diet program. Their weights 
(pounds) were measured before and after the program and found to be as follows: 


Participant 


1 2 3 4 5 6 7 8 9 


Before 132 139 126 114 122 132 142 119 126 
After 124 141 118 116 114 132 145 123) 121 


Let us test the null hypothesis that the diet is not effective, Hp: ju; — 2 = 0, against the 
alternative, Hy: j1; — [42 > O, that it is effective at level a = 0.01. We compute 


§-2+8-2+840-3—445 _ 18 _, 
9 9 
s4 = 26.75, Sq = 5.17. 


d= 
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Thus 


SOME FURTHER RESULTS ON HYPOTHESES TESTING 


5 5.17 5.17 
diy Fanta =0+ “gto = 3 2.896 = 4.99. 


Since d # 4.99, we cannot reject hypothesis Hp that the diet is not very effective. 


PROBLEMS 10.4 


1. 


The manufacturer of a certain subcompact car claims that the average mileage of 
this model is 30 miles per gallon of regular gasoline. For nine cars of this model 
driven in an identical manner, using | gallon of regular gasoline, the mean distance 
traveled was 26 miles with a standard deviation of 2.8 miles. Test the manufacturer’s 
claim if you are willing to reject a true claim no more than twice in 100. 


. The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2 


milligrams with a standard deviation of 2.05 milligrams. Test the hypothesis that the 
average nicotine content of this brand of cigarettes does not exceed 19.7 milligrams. 
Use a = 0.05. 


. The additional hours of sleep gained by eight patients in an experiment with a certain 


drug were recorded as follows: 


Patient | 1 5 = .4 & & 7 8 
Hours Gained | 67 =i 34 08 20 61 =02 35 


Assuming that these patients form a random sample from a population of such 
patients and that the number of additional hours gained from the drug is a normal 
random variable, test the hypothesis that the drug has no effect at level a = 0.10. 


. The mean life of a sample of 8 light bulbs was found to be 1432 hours with a standard 


deviation of 436 hours. A second sample of 19 bulbs chosen from a different batch 
produced a mean life of 1310 hours with a standard deviation of 382 hours. Making 
appropriate assumptions, test the hypothesis that the two samples came from the 
same population of light bulbs at level a = 0.05. 


. A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A further 


sample of 20 values has a mean of 55.4 and a variance of 2.5. Test the hypothesis 
that the two samples came from the same normal population. 


. Two methods were used in a study of the latent heat of fusion of ice. Both method A 


and method B were conducted with the specimens cooled to —0.72°C. The following 
data represent the change in total heat from —0.72°C to water, 0°C, in calories per 
gram of mass: 


Method A: 79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 
80.02, 80.00, 80.02 
Method B: 80.02,79.74,79.98, 79.97, 79.97, 80.03, 79.95,79.97 
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10. 


Perform a test at level 0.05 to see whether the two methods differ with regard to their 
average performance (Natrella [75, p. 3-23]). 


. In Problem 6, if it is known from past experience that the standard deviations of the 


two methods are a4 = 0.024 and og = 0.033, test the hypothesis that the methods 
are same with regard to their average performance at level a = 0.05. 


. During World War II bacterial polysaccharides were investigated as blood plasma 


extenders. Sixteen samples of hydrolyzed polysaccharides supplied by various man- 
ufacturers in order to assess two chemical methods for determining the average 
molecular weight yielded the following results: 


Method A: 62,700;29, 100;44, 400; 47, 800; 36, 300; 40, 000; 43, 400; 35, 800; 
33,900; 44, 200; 34, 300; 31, 300; 38,400; 47, 100; 42, 100;42, 200 
Method B: 56,400;27,500;42, 200; 46, 800; 33, 300; 37, 100; 37, 300; 36, 200: 
35,200; 38, 000; 32, 200; 27, 300; 36, 100; 43, 100; 38, 400; 39, 900 


Perform an appropriate test of the hypothesis that the two averages are the same 
against a one-sided alternative that the average of Method A exceeds that of 
Method B. Use a = 0.05. (Natrella [75, p. 3-38]). 


. The following grade-point averages were collected over a period of 7 years to 


determine whether membership in a fraternity is beneficial or detrimental to grades: 


Year 


1 2 3 4 > 6 7 


Fraternity 24 20 23 21 2.1 20 2.0 
Nonfraternity 24 22 25 24 23 18 1.9 


Assuming that the populations were normal, test at the 0.025 level of significance 
whether membership in a fraternity is detrimental to grades. 

Consider the two sample t-statistic T = (X — Y)/[Sp\/1/m+1/n], where S> = 
[(m — 1)S? + (n — 1)S5]/(m+n—2). Suppose a, # 02. Let m,n — 00 such that 
m/(m-+n) — p. Show that, under jz; = pg, T—++U, where U ~ N(0,72), where 
7 = [(1—p)o7 + po3]/[eot + (1 — p)o3]. Thus when m ~ n, p © 1/2 and 7? ~ 1 
and T is approximately N(0, 1) as m(= n) — ov. In this case, a f-test based on T will 
have approximately the right level. 


10.5  F-TESTS 


The term F-tests refers to tests based on an F-statistic. Let X,Xo,...,X and Y,, Yo,...,Yn 
be independent samples from N(j11,07) and N(j2,03), respectively. We recall that 
dT (X% — X)/of ~ x2 (m — 1) and S01 (¥; — Y)*/o3 ~ x?(n— 1) are independent RVs, 
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so that the RV 


SMX —X)? oF(n-1) _ 03 $3 
"(¥;—Y) o3(m—1) of S3 


is distributed as F(m—1,n—1). 
The following table summarizes the F-tests: 


Reject Ho at Level a if 


Ho A, [41,2 Known [1 2 Unknown 

m 2 2, 
% Sy i ba) m ST 

i, Oy < 04 of > a5 ~ 2 2 m,n, 2 2 Fin—1,n—1,0 
yA (yi — M2) n S85 
n 2 2 
2 10% — #2) n 82 

TT; or > a5 axa a 2 = Fim,a = > Fr-1m—1,0 
ye = et) m ST 


m 2 
ye i #1) iy Me e ‘i -" 
2 7 2 = Ty 3 =f m—1,n—1,a/2 
IW. oF =03 0% #03 § Yili-He) bi se / 
or < FH Eimnl—a/2 or < Fm—1n—1,1—a,/2 


Remark I. Recall (Remark 6.4.5) that 
Finn—a = ee ae 


Remark 2. The tests described above can be easily obtained from the likelihood ratio pro- 
cedure. Moreover, in the important case where 1) , /42 are unknown, tests I and IT are UMP 
unbiased and UMP invariant. For test III we have chosen equal tails, as is customarily done 
for convenience even though the unbiasedness property of the test is thereby destroyed. 


Example 1 (Example 10.4.2 continued). In Example 10.4.2 let us test the validity of the 
assumption on which the f-test was based, namely, that the two populations have the 
same variance at level 0.05. We compute s7/s3 = (420/390)* = 196/169 = 1.16. Since 
Fn—1n—1,02/2 = F8,15,0.025 = 3.20, we cannot reject Hp: 7) = 02. 


An important application of the F-test involves the case where one is testing the equality 
of means of two normal populations under the assumption that the variances are the same, 
that is, testing whether the two samples come from the same population. Let X,,X2,...,Xim 
and Y;, Y2,...,Y, be independent samples from N(j11,07) and N(2, 03), respectively. If 
o? = 03 but is unknown, the f-test rejects Ho: 41 = [U2 if |T| > c, where c is selected so 


that a2 = P{|T| >c| 1 = f2,01 = on}, that is, ¢ = tn4n—2,0,/25pV (1/m+1/n), where 


2 (m—1)s}+(n—1)s5 
2 m+n—2 


? 
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51,52 being the sample variances. If first an F-test is performed to test 0; = 02, and then 
a t-test to test W; = [2 at levels a; and a2, respectively, the probability of accepting both 
hypotheses when they are true is 


P{\T| <c,c1 < F< c2|f1 = 2,01 = 09}; 


and if F is independent of T, this probability is (1 — a,)(1 — az). It follows that the 
combined test has a significance level a = 1 — (1 — a,)(1 — az). We see that 


a=a,+a2—aja2 <a,+a2 


and a > max(a;,Q2). In fact, a will be closer to a, + Qo, since for small a; and a2, aa 
will be closer to 0. 

We show that F is independent of T whenever 0; = 09. The statistic V = (X,Y, 37’ 
(X; — X)* + 7-1 (¥; — Y)’) is a complete sufficient statistic for the parameter (11, /2, 
01 = 02) (see Theorem 8.3.2). Since the distribution of F does not depend on j11, [U2, 
and 0; = 02, it follows (Problem 5) that F is independent of V whenever o; = a2. But T 
is a function of V alone, so that F must be independent of T also. 

In Example 1, the combined test has a significance level of 


a = 1 — (0.95) (0.95) = 1 —0.9025 = 0.0975. 


PROBLEMS 10.5 


1. For the data of Problem 10.4.4 is the assumption of equality of variances, on which 
the t-test is based, valid? 

2. Answer the same question for Problems 10.4.5 and 10.4.6. 

3. The performance of each of two different dive bombing methods is measured a dozen 
times. The sample variances for the two methods are computed to be 5545 and 4073, 
respectively. Do the two methods differ in variability? 

4. In Problem 3 does the variability of the first method exceed that of the second 
method? 

5. Let X = (X1,X2,...,X,) be a random sample from a distribution with PDF (PMF) 
f(x,@), 0 € © where O is an interval in R;. Let T(X) be acomplete sufficient statistic 
for the family {f(x;@): 0 € O}. If U(X) is a statistic (not a function of T alone) 
whose distribution does not depend on 6, show that U is independent of T. 


10.6 BAYES AND MINIMAX PROCEDURES 


Let X1,X2,...,X, be a sample from a probability distribution with PDF (PMF) fo, 0 € ©. 
In Section 8.8 we described the general decision problem, namely, once the statistician 
observes x, she has a set A of options available. The problem is to find a decision func- 
tion d that minimizes the risk R(0,6) = EgL(0,6) in some sense. Thus a minimax solution 
requires the minimization of max R(6,0), while a Bayes solution requires the minimiza- 
tion of R(z,5) = ER(0,0), where x is the a priori distribution on O. In Remark 9.2.1 
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we considered the problem of hypothesis testing as a special case of the general decision 
problem. The set A contains two points, ao and a); ao corresponds to the acceptance of 
Ho: @ € Oo, and a; corresponds to the rejection of Ho. Suppose that the loss function is 
defined by 


L(0, ao) = a(@) if9€O,, a(@) > 0, 
L(0,a,) =b(0) if9€O, b(0)>0, fis 
L(0,a9) =0 if 9 € Oo, 
L(0,a,) =0 if@cO, 
Then 
R(O,6(X)) =L(9,a0)Po{6(X) = ao} + L(G, a1) Po{o(X) = ai} (2) 
_ a(0)P9{d(X) = ao} ifOEcO, G3) 
| b(A)Pe{6(X) =a} if OE Op. 


A minimax solution to the problem of testing Ho: 6 € Oo against H,: 6 € QO, where 
0 = Op + Oj, is to find a rule 6 that minimizes 


max|a(9)Po{d(X) =ao}, b(0)Po{d(X) =ay}}. 


We will consider here only the special case of testing Hy: 6 = 09 against H,: 0 = 6). 
In that case we want to find a rule 6 which minimizes 


max|aP9, {6(X) = ao}, bP9,{0(X) =a; }]. (4) 
We will show that the solution is to reject Ho if 


fo, (x) 
fo (x 


provided that the constant k is chosen so that 


Se (5) 


ers 


R(00,5(X)) = R(A1,0(X)), (6) 


where 6 is the rule defined in (5); that is, the minimax rule 6 is obtained if we choose k 
in (5) so that 


aPo, {6(X) = ao} = bPa{5(X) =ai}, (7) 


or, equivalently, we choose k so that 


aPo, \@ coe x = bPp, {f Ss ‘| (8) 


BAYES AND MINIMAX PROCEDURES 493 


Let 6* be any other rule. If R(@,5) < R(O,6*), then R(A,5) = R(1,0) < 
max|R(9,6*), R(@1,6*)] and 5* cannot be minimax. Thus, R(69,0) > R(00,6*), which 
means that 


Po,{0' (X&) =a} < Po, {5(X) =a} = P{Reject Ho | Ho true}. (9) 


By the Neyman—Pearson lemma, rule 6 is the most powerful of its size, so that its power 
must be at least that of 6*, that is, 


Po, {5(X) = ay} = Po, {5"(X) = ai} 

so that 

Po, {0(X) = ao} < Po, {0° (X) =a}. 
It follows that 

aPo,{6(X) = ao} < aPo,{6*(X) =ao} 
and hence that 

R(6,,d) < R(@, 0"). (10) 
This means that 
max[R(69,0), R(A1,0)] =R(A1,0) < R(@,, 0°) 
and thus 
max[R(00,5), R(@1,6)] < max[R(00,6*), R(01,6*)]. 


Note that in the discrete case one may need some randomization procedure in order to 
achieve equality in (8). 


Example 1. Let X),X>,...,X, be iid N(u, 1) RVs. To test Ho: = fo against Hy: w= py 
(> Lo), we should choose k so that (8) is satisfied. This is the same as choosing c, and 
thus k, so that 


aP y,{X <c} = bP, {X > ce} 


or 


on dt <0 = RY 


Thus 
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where ® is the DF of an N(0,1) RV. This can easily be accomplished with the help of 
normal tables once we know a, b, Uo, [41, and n. 


We next consider the problem of testing Hj: 6 € Oo against H,;: 0 € ©, from a 
Bayesian point of view. Let 7(@) be the a priori probability distribution on O. 
Then 


R(x,d) =EoR(0, a 
=e, \r(0)d0 if isa pdf, 
wer (6, 6)r(0) if risapmf, 
Jo, (9) (0) Po{5(X) = a1 }d0+ 
So, (9) (9)Po{5(X) = ao}d0 if w isa PDF, 
Me, (9) (0)Po{d(X) = ai}+ 
Moe, U9)T(A)Po{d(X) =ao} if wis a PMF. 


(1) 


The Bayes solution is a decision rule that minimizes R(7, 5). In what follows we restrict 
our attention to the case where both Hp and Hj; have exactly one point each, that is, 
Oo = {Oo}, QO, = {O;}. Let (00) = 79 and m(01) = 1-7 = 7). Then 


R(,0) = broPo,{0(X) = a\}+am Po, {0(X) = ao}, (12) 
where b(6)) = b, a(0,) = a; (a,b > 0). 


Theorem 1. Let X = (X,,X2,...,X,) be an RV of the discrete (continuous) type with 
PMF (PDF) fo, 0 € O = {60,01}. Let 7(80) = 70, 7(61) = 1 — 79 = m™ be the a priori prob- 
ability mass function on 0. A Bayes solution for testing Hy): X ~ fg, against H;: X ~ fo,, 
using the loss function (1), is to reject Ho if 


fo, (x) > bro 
fog (x) ~ amy 


Proof. We wish to find 6 which minimizes 


R(7,0) = broPo,{5(X) = a1} +am Po, {5(X) = ao}. 


(13) 


Now 
R(,6) =EpR(0,6) 
=E{E, {L(0, ) |X}} 


so it suffices to minimize {Eg {L(0,6)|X}. 
The a posteriori distribution of 6 is given by 


— (O)folx) 
MOP) 5 4 (ayn (0) 
_ (6Vfo(x) 
Tafoo (x) + Tif, (x) 
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Trafo (Xx) 2 
f0=0 
_) afin) mf) as 
Tifo, (x) if@= 6; 
Tafa (x) + fo, (x) , 
Thus 

_ _ bh(4o|x), 0 = 6,0(X) =a, 
ea ae oo 6 = 61,5(X) =ao, 


It follows that we reject Ho, that is, 6(X) = a, if 
bh(o|x) < ah(6\|x), 
which is the case if and only if 
brofe,(x) < am fo, (x), 


as asserted. 


Remark 1. In the Neyman—Pearson lemma we fixed Pg, {6(X) = ay}, the probability of 
rejecting Hp when it is true, and minimized Pg, {6(X) = ao}, the probability of accepting 
Ho when it is false. Here we no longer have a fixed level a for Pg, {6(X) = ay}. Instead 
we allow it to assume any value as long as R(z, 45), defined in (12), is minimum. 


Remark 2. It is easy to generalize Theorem | to the case of multiple decisions. Let X be 
an RV with PDF (PMF) fg, where @ can take any of the k values 6), 02,...,0,. The problem 
is to observe x and decide which of the 6;’s is the correct value of 6. Let us write H;: 0 = 6;, 
i=1,2,...,k, and assume that 7(6;) = 7;, i= 1,2,...k, S 7; = 1, is the prior probability 
distribution on © = {6),62,...,0,}. Let 


1 if 6 chooses 6,7 A i. 


L(0;,6 = 
(6,9) f if 6 chooses 6;. 


The problem is to find a rule 6 that minimizes R(7,5). We leave the reader to show that a 
Bayes solution is to accept H;: 0 = 6; = 1,2,...,k) if 
Tifo, (x) = Tfo, (x) for all j Ai,j = 1,2,...,k, (15) 


where any point lying in more than one such region is assigned to any one of them. 
Example 2. Let X;,X2,...,X, be iid N(ju, 1) RVs. To test Ho: = fo against Hy: w= py 


(> po), let us take a = b in the loss function (1). Then Theorem | says that the Bayes rule 
is one that rejects Ho if 
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that is, 


exp aie zien 5 70 


“ n( HG — Hi) 70 
> ; 
on Ho) > it 2 —~ 1—19 


This happens if and only if 


2 1 log[mo/(—70)) | Ho +H 
no ‘nh La — Lo 2 * 


where the logarithm is to the base e. It follows that, if 7 = S, the rejection region 
consists of 


Example 3. This example illustrates the result described in Remark 2. Let X,,X,...,Xn 
be a sample from N(y,1) and suppose that jz can take any one of the three values /11, 
2, Or 3. Let fy < flo < pg. Assume, for simplicity, that 7; = 72 = 73. Then we accept 
Ay? f= pt = 1,2,3, 16 


n 1. pen? n a \2 
nes - “H) | enon {- =H) 
k 


k=1 =1 
for each j A i,j = 1,2,3. 


It follows that we accept H; if 


pee 
(ui — Ey)R+ = 0, J=1,2,3,G4i), 
that is, 


Sey J2l2365 Fo: 


(Hi — by) 2 
Thus, the acceptance region of H is given by 


and xs 


and x< 
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and that of H; by 


and x> 


In particular, if 4; = 0, p22 = 2, w3 = 4, we accept Hy if x < 1, Ay if 1 <x <3, and A 
if x > 3. In this case, boundary points | and 3 have zero probability, and it does not matter 
where we include them. 


PROBLEMS 10.6 


1. 


In Example 1 let n = 15, uo = 4.7, and pw, = 5.2, and choose a = b > 0. Find the 
minimax test, and compute its power at 4 = 4.7 and pp = 5.2. 


. A sample of five observations is taken on a b(1,0) RV to test Ho: 0 = 5 against 


A, :0= 3. 
(a) Find the most powerful test of size a = 0.05. 
(b) If L(1,1) = 1(2, 2) =0, L(1, 2) = 1, and L(2, }) =2, find the minimax tule. 
22 44 294 472 
(c) If the prior probabilities of 6 = 5 and 6 = 3 are 7 = ; and 7; = , respectively, 
find the Bayes rule. 


. A sample of size n is to be used from the PDF 


paste”, x>0, 


to test Hy: 0 = 1 against H,: 0 = 2. If the a priori distribution on 6 is 79 = i, Ty = i 


and a = b, find the Bayes solution. Find the power of the test at @ = 1 and @ = 2. 


. Given two normal densities with variances | and with means — 1 and 1, respectively, 


find the Bayes solution based on a single observation when a = b and (a) 7 = 
™ = 5, and (b) m = 4,771 = 3. 


. Given three normal densities with variances 1 and with means —1, 0, 1, respec- 


tively, find the Bayes solution to the multiple decision problem based on a single 
observation when 77, 


2 1 
59 12 = 5573 = 5- 


. For the multiple decision problem described in Remark 2 show that a Bayes solution 


is to accept Hj}: 0 = 6; (i= 1,2,....k) if (15) holds. 


11 


CONFIDENCE ESTIMATION 


11.1 INTRODUCTION 


In many problems of statistical inference the experimenter is interested in constructing 
a family of sets that contain the true (unknown) parameter value with a specified (high) 
probability. If X, for example, represents the length of life of a piece of equipment, the 
experimenter is interested in a lower bound @ for the mean 6 of X. Since 6 = 0(X) will be 
a function of the observations, one cannot ensure with probability | that 6(X) < 0. All that 
one can do is to choose a number | — a: that is close to 1 so that Pg {0(X) < 0} > 1—a for 
all 6. Problems of this type are called problems of confidence estimation. In this chapter 
we restrict ourselves mostly to the case where O C & and consider the problem of setting 
confidence limits for the parameter 6. 

In Section 11.2 we introduce the basic ideas of confidence estimation. Section 11.3 
deals with various methods of finding confidence intervals, and Section 11.4 deals with 
shortest-length confidence intervals. In Section 11.5 we study unbiased and equivariant 
confidence intervals. 


11.2 SOME FUNDAMENTAL NOTIONS OF CONFIDENCE ESTIMATION 


So far we have considered a random variable or some function of it as the basic observable 
quantity. Let X be an RV, and a, b, be two given positive real numbers. Then 


An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 


500 CONFIDENCE ESTIMATION 


P{a<X <b}=P{a<X and X <b} 


bX 
=P<X<b<—>, 
a 


and if we know the distribution of X and a,b, we can determine the probability P{a < 
X <b}. Consider the interval /(X) = (X,bX/a). This is an interval with end points that 
are functions of the RV X, and hence it takes the value (x, bx/a) when X takes the value x. 
In other words, /(X) assumes the value /(x) whenever X assumes the value x. Thus /(X) 
is arandom quantity and is an example of a random interval. Note that [(X) includes the 
value b with a certain fixed probability. For example, if b= 1,a= 5, and X is U(0, 1), 
the interval (X,2X) includes point 1 with probability 5. We note that /(X) is a family 
of intervals with associated coverage probability P(I(X) 3 1) = }. It has (random) length 
¢(I(X)) = 2X —X =X. In general the larger the length of the interval the larger the coverage 
probability. Let us formalize these notions. 


Definition 1. Let Pg, 0c O C R, be the set of probability distributions of an RV X. A 
family of subsets S(x) of O, where S(x) depends on the observation x but not on @, is 
called a family of random sets. If, in particular, 9 C R and S(x) is an interval (0(x),0(x)), 
where @(x) and @(x) are functions of x alone (and not 0), we call S(X) a random interval 
with @(X) and 6(X) as lower and upper bounds, respectively. #(X) may be —oo, and 
0(X) may be +00. 


In a wide variety of inference problems one is not interested in estimating the parameter 
or testing some hypothesis concerning it. Rather, one wishes to establish a lower or an 
upper bound, or both, for the real-valued parameter. For example, if X is the time to failure 
of a piece of equipment, one may be interested in a lower bound for the mean of X. If the 
RV X measures the toxicity of a drug, the concern is to find an upper bound for the mean. 
Similarly, if the RV X measures the nicotine content of a certain brand of cigarettes, one 
may be interested in determining an upper and a lower bound for the average nicotine 
content of these cigarettes. 

In this chapter we are interested in the problem of confidence estimation, namely, that of 
finding a family of random sets S(x) for a parameter 0 such that, for a givena,0<a< 1 
(usually small), 


Po{S(X) 30} >1-a for allO€ O. (1) 
We restrict our attention mainly to the case where? CO CR. 


Definition 2. Let 6 <¢ 0 C Rand0<a<1.A statistic 6(X) satisfying 
Po{O(X) < 0} > 1l-a for all 0 (2) 


is called a lower confidence bound for 6 at confidence level 1 — a. The quantity 


inf Po{O(X) <0} @) 


is called the confidence coefficient. 
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Definition 3. A statistic @ that minimizes 
Po{O(X) <0} forall 0’ <0 (4) 


subject to (2) is known as a uniformly most accurate (UMA) lower confidence bound for 
@ at confidence level 1 — a. 


Remark I. Suppose X ~ P¢ and (2) holds. Then the smallest probability of true coverage, 
Po{O(X) < 0) = Po{[O(X), co) 5 6} is 1 — a. Then the probability of false (or incorrect) 
coverage is Pg{|0(X),0o) 3 6’} = Po {O(X) < 6} for 0’ < @. According to Definition 3 
among the class of all lower confidence bounds satisfying (2), a UMA lower confidence 


bound has the smallest probability of false coverage. 


Similar definitions are given for an upper confidence bound for 6 and a UMA upper 
confidence bound. 


Definition 4. A family of subsets S(x) of O C R, is said to constitute a family of 
confidence sets at confidence level 1 — a if 


Pe{S(X) 30} >1-a for all 9 € O, (5) 


that is, the random set S$(X) covers the true parameter value @ with probability > 1 — a. 
A lower confidence bound corresponds to the special case where k = | and 


S(X) = {0: B(x) <8 <0}; 6) 
and an upper confidence bound, to the case where 
S(x) = {0: 0(x) > 6 > —oo}. (7) 
If S(x) is of the form 
S(x) = (A(x), 0(x)) (8) 
we will call it a confidence interval at confidence level 1 — a, provided that 
Po{8(X)<0<A(X)}>1l—-a forall d, (9) 
and the quantity 
inf Po{ A(X) <0<6(X)} (10) 
will be referred to as the confidence coefficient associated with the random interval. 


Remark 2. We write S(X) 5 @ to indicate that X, and hence S(X), is random here and 
not @ so the probability distribution referred to is that of X. 
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Remark 3. When X = x is the realization the confidence interval (set) S(x) is a fixed sub- 
set of Ry. No probability is attached to S(x) itself since neither 6 nor S(x) has a probability 
distribution. In fact either S(x) covers 8 or it does not and we will never know which since 
6 is unknown. One can give a relative frequency interpretation. If (1 — a)-level confidence 


sets for 8 were computed a large number of times, then a fraction (approximately) 1 — a 
of these would contain the true (but unknown) parameter value. 


Definition 5. A family of (1 — a)-level confidence sets {S(x)} is said to be a UMA family 
of confidence sets at level 1 — a if 


Po{S(X) contains 6'} < Pe{S'(X) contains 6’} 
for all 9 4 6’ and any (1 — a)-level family of confidence sets S’(X). 


Example 1. Let X1,X2,...,Xn be iid RVs, X; ~ N(u,07). Consider the interval (X — c1, 
X+ c2). In order for this to be a (1 — a)-level confidence interval, we must have 


P{X —c, <w<X+~}>1-a, 
which is the same as 
P{w—c2 <X < pte} >1-a. 


Thus 


X_ 
at 2 fe Mi < Vi} > 1a, 
Oo (en (on 


Since \/n(X — 1) /o ~ N(0,1), we can choose c; and c, to have equality, namely, 


X_ 
rat aay MVi< 2 Vi} = 1-0, 
om oO (on 


provided that o is known. There are infinitely many such pairs of values (cj,c2). In 
particular, an intuitively reasonable choice is c; = —cz = c, say. In that case 


CVn 


Ta Sae/2s 


and the confidence interval is (X —(0/./1)Za/2,X + (7/\/1)Za/2). The length of this inter- 
val is (20 /,/n)Zq/2. Given o and a, we can choose n to get a confidence interval of a fixed 
length. 

If o is not known, we have from 


P{-c2 <X-p<cj}>l-a 
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that 
X= 
S/n 


and once again we can choose pairs of values (c;,c2) using a t-distribution with n — 1 df. 
such that 


at Svi< < Svib>1-a, 


Crf/n Xp c/n 
pi— = sia 
{ 5 < 5 Jn< 5 \ a 


In particular, if we take c; = —c2 = c, say, then 


Jn 
es = tn—1,0/2) 


and (X —($/\/)ty—1,/2,X + (S/V/1)tn—1,«/2),i8 a (1 — @)-level confidence interval for ju. 
The length of this interval is (25/,/1)t,—1,«/2, Which is no longer constant. Therefore we 
cannot choose n to get a fixed-width confidence interval of level 1 — a. Indeed, the length 
of this interval can be quite large if o is large. Its expected length is 


2, Bee [2 T(n/2) 
od a = —F—!n-1,a 0, 
va el a PN n— 10 [(n— 1/2 


which can be made as small as we please by choosing n large enough. 


Example 2. In Example 1, suppose that we wish to find a confidence interval for o? 
instead when jz is unknown. Consider the interval (aS, 7S"), C1,C2 > 0. We have 


P{oS? <o< c2S"} >l-a, 


so that 
S2 
Pf es" <> <o'} >l-a. 
o2 


Since (n — 1)S?/a? is x?(n—1), we can choose pairs of values (c;,c2) from the tables of 
the chi-square distribution. In particular, we can choose cj, cz so that 


Ss 1 a Sot 
Ps > == < : 
a Cy 2 a7 ~ C2 


“29 
= Xn-1,1—-a/2° 


(n—1)S? (n—1)S? 
oe sana 


Then 


Thus 
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is a (1 — a)-level confidence interval for o* whenever jz is unknown. If jz is known, then 
n 


ee pi N2 
y (X; ) ~ x?(n). 


oO 
1 


Thus we can base the confidence interval on 57} (Xi— i)”. Proceeding similarly, we get a 
(1 — a)-level confidence interval as 


rece Ein), 


2 ? 2 
Xn,o./2 Xnl—a/2 


Next suppose that both j: and o? are unknown and that we want a confidence set for 
(41,07). We have from Boole’s inequality 


= 8 oe (n-1)8  , . (n—1)8? 
P X= ah te 2<b<X+ eh 1,0 Og SO Ray 
vn ” vn u Nitsa om re 


oo —_— 
> 1 PLR + Fetpty/2 SHOR te-toyrr > oh 


vn vn 
P| (n— 18" <g or et 1)s° >| 
x 


2: — 
n—1,1—ay/2 Xn—1,a2/2 


=1-—a,—o, 


so that the Cartesian product, 


= /§ =, § (n—1)S?  (n—1)S? 
S(X) = (x- FH th-1,0 Xx + Felhi-1,a ) x ’ 
vn ie vn noe Lass ae 


is a (1 — a; — a9)-level confidence set for (j1,07). 


11.33 > METHODS OF FINDING CONFIDENCE INTERVALS 


We now consider some common methods of constructing confidence sets. The most 
common of these is the method of pivots. 


Definition 1. Let X ~ Pg. A random variable T(X,@) is known as a pivot if the 
distribution of T(X,0) does not depend on 8. 


In many problems, especially in location and scale problems, pivots are easily found. 
For example, in sampling from f(x — 0), X(,) — 6 is a pivot and so is X — 0. In sam- 
pling from (1/o)f(x/o), a scale family, X(,)/o is a pivot and so is X(j)/o, and in 
sampling from (1/a)f((x— @)/c), a location-scale family, (X — 0)/S is a pivot, and so 
is (Xz) +X) - 26) /S. 
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If the DF Fg of X; is continuous, then Fg (X;) ~ U[0, 1] and, in case of random sampling, 
we can take 


0) =| [Fox 


or, 
—logT(X,0) = — Dhow lx 


as a pivot. Since Fg(X;) ~ U[0,1], —logF9(X;) ~ G(1,1) and —S~"_, log Fg(X;) ~ 
G(n, 1). It follows that — 5>*_, log Fo(X;) is a pivot. 

The following result gives a simple sufficient condition for a pivot to yield a confidence 
interval for a real-valued parameter 0. 


Theorem 1. Let T(X,@) be a pivot such that for each 0, T(X,@) is a statistic, and as a 
function of 6, T is either strictly increasing or decreasing at each x € R®,. Let A C ® be 
the range of T, and for every \ € A and x € RX, let the equation \ = T(x, 0) be solvable. 
Then one can construct a confidence interval for 6 at any level. 


Proof. Let 0 <a < 1. Then we can choose a pair of numbers A\(a@) and A2(q@) in A not 
necessarily unique, such that 


Pof{r(a@) < T(X,9) < Ar(a)} > l-a for all 0. (1) 


Since the distribution of T is independent of 6, it is clear that A; and Az are independent 
of @. Since, moreover, T is monotone in @, we can solve the equations 


T(x,0) = Ai (a) and T(x, 0) = A2(a) (2) 
for every x uniquely for 6. We have 
Po{O(X)<0<A(X)}>1l-a forall, (3) 


where 0(X) < 6(X) are RVs. This completes the proof. 


Remark 1, The condition that \ = T(x,0) be solvable will be satisfied if, for example, T 
is continuous and strictly increasing or decreasing as a function of @ in O. 


Note that in the continuous case (that is, when the DF of 7 is continuous) we can find 
a confidence interval with equality on the right side of (1). In the discrete case, however, 
this is usually not possible. 
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Remark 2. Relation (1) is valid even when the assumption of monotonicity of T in the 
theorem is dropped. In that case inversion of the inequalities may yield a set of intervals 
(random set) S(X) in © instead of a confidence interval. 


Remark 3. The argument used in Theorem | can be extended to cover the multiparameter 
case, and the method will determine a confidence set for all the parameters of a distribution. 


Example 1. Let X,,X2,...,Xn ~ N(ju,07), where o is unknown and we seek a (1 — a)- 
level confidence interval for jw. Let us choose 


xX— 
T(X, 1) = evn. 


where X, S* are the usual sample statistics. The RV T(X, ) has Student’s f-distribution 
with n — | d.f., which is independent of jz and T(X, i), as a function of ~ is monotone. We 
can clearly choose A; (a), A2(a) (not necessarily uniquely) so that 


P{ri (a) < T(X,p) < Ax(a)} =1-—a for all yu. 


Solving 


we get 


(X) =X- la), n(X) =¥- 


and the (1 — a)-level confidence interval is 


(x- Fala). h(a). 


In practice, one chooses \7(~) = —A}(@) = ty—1,0/2 
Example 2. Let X,,X2,...,X;, be tid with common PDF 
fo(x) = exp{—(x—0)}, x > 0, and 0 elsewhere. 


Then the joint PDF of X is 


f(x;0) = exp 1-3 | Tia) >]: 
i=l 


Clearly, T(X,0) = X(1) — 4 is a pivot. We can choose A; (a), A2(@) such that 
Po {Ai (a) < Xi) —9 < Ax(a)} = 1a forall 4, 


which yields (X(1) — A2(@),X(1) — A1(@)) as a (1 — @)-level confidence interval for 0. 
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Remark 4. In Example 1 we chose Ay = —A, whereas in Example 2 we did 
not indicate how to choose the pair (A;,A2) from an infinite set of solutions to 
Po {Ai (a) < T(X,0) < A2(a)} = 1 —a. One choice is the equal-tails confidence interval 
which is arrived at by assigning probability a/2 to each tail of the distribution of T. This 
means that we solve 


a/2 = Po{T(X,0) < 1} = P{T(X, 9) > A}. 


In Example | symmetry of the distribution leads to the indicated choice. In Example 2, 
Y= X 1) —@ has PDF 


g(y) =nexp(—ny) for y > 0 
so we choose (Aj, \2) from 
Po {Xiy-9< Ai} = a/2= Pe {X)—9 > 2}, 


giving Ax(a@) = (1/n)én(a/2) and A;(a@) = —(1/n)én(1 —a/2). Yet another method is to 
choose 4j, 2 in such a way that the resulting confidence interval has smallest length. We 
will discuss this method in Section 11.4. 


We next consider the method of test inversion and explore the relationship between a 
test of hypothesis for a parameter @ and confidence interval for 6. Consider the following 
example. 


Example 3. Let X,,X2,...,X, be a sample from Nise), where oy is known. In 
Example 11.2.1 we showed that 


= 1 = 1 
: a — 
( nr + Zetaps00) 


is a (1 — a)-level confidence interval for j:. If we define a test y that rejects a value of 
[4 = Lo if and only if pio lies outside this interval, that is, if and only if 


Vail — p0| 


2 La/2) 
90 


then 


X— po 
Pao {vi | > tap b=ay 


00 


and the test y is a size a test of 4 = fo against the alternatives up A [Uo0. 

Conversely, a family of a-level tests for the hypothesis ~ = ju generates a family of 
confidence intervals for jz by simply taking, as the confidence interval for jo, the set of 
those yz for which one cannot reject ps = jo. 

Similarly, we can generate a family of a-level tests from a (1 — @)-level lower (or upper) 
confidence bound. Suppose that we start with the (1 — a)-level lower confidence bound 
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X —Z.(o0/./n) for p. Then, by defining a test ~(X) that rejects  < pig if and only if 
Lo < X —Zo.(a0//n), we get an a-level test for a hypothesis of the form ju < pio. 

Example | is a special case of the duality principle proved in Theorem 2 below. In the 
following we restrict attention to the case in which the rejection (acceptance) region of 
the test is the indicator function of a (Borel-measurable) set, that is, we consider only 
nonrandomized tests (and confidence intervals). For notational convenience we write 
Ho(9) for the hypothesis Ho: @ = 09 and H,(6o) for the alternative hypothesis, which 
may be one- or two-sided. 


Theorem 2. Let A(@), 69 € ©, denote the region of acceptance of an a-level test of 
H (09). For each observation x = (x),x2,...,X,) let S(x) denote the set 


S(x) = {0: x € A(A),0 € O}. (4) 


Then S(x) is a family of confidence sets for at confidence level 1 — a. If, moreover, 
A(0) is UMP for the problem (a, Ho(0),H1(@0)), then SCX) minimizes 


Po{S(X) 3 0"} for all 6 € H,(0’) (5) 
among all (1 — a)-level families of confidence sets. That is, S(X) is UMA. 
Proof. We have 
S(x) 30 if and only x € A(6), (6) 
so that 
Po{S(X) 5 0} = Po{X € A(O)} > 1—a, 
as asserted. 
If S*(X) is any other family of (1 — a)-level confidence sets, let A*(0) = 
{x: S*(x) 3 0}. Then 
Po{X € A*(0)} = Po{S*(X) 30} >1-a 
and since A(69) is UMP for (a, Ho(90),H1(90)), it follows that 
Po{X € A*(O0)} > Po{X € Alo) } for any 0 € Hy(0o). 
Hence 
Po{S*(X) 3 Oo} > Po{X € A(Oo)} = Po{S(X) 3 A} 


for all 6 € H) (0). This completes the proof. 
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Example 4. Let X be an RV of the continuous type with one-parameter exponential PDF, 
given by 


fo(x) = exp{Q(@)T(x) + S"(x) +D(4)}, 


where Q(@) is a nondecreasing function of 0. Let Hy: 9 = 0) and H,: 0 < 0. Then the 
acceptance region of a UMP size a test of Ho is of the form 


A(09) = {x: T(x) > c(O)}. 
Since, for 6 > 6’, 
Po {T(X) < c(')} =a = Po{T(X) < c(9)} < Por {T(X) < c(4)}, 


c(@) may be chosen to be nondecreasing. (The last inequality follows because the power 
of the UMP test is at least a, the size.) We have 


S(xx) = {0: xe A(O)}, 
so that S(x) is of the form (—0o, c~!(T(x))), or (—o0,c~!(T(x))], where c~! is defined by 
"(T(x)) = supt0: e(0) $70} 

In particular, if X;,X2,...,X, is a sample from 


1 
—¢ 9, x>0, 


0 
0, otherwise, 


then T(x) = )>y_, x; and for testing Ho: 6 = 00 against H) : 6 < 0, the UMP acceptance 
region is of the form 


A(00) = {x: yi 2 c(90)}, 


where c(0o) is the unique solution of 


co n—1 
/ u e *dy=1-a, 0<a<l. 
c(0) / 90 (n— 1)! 


The UMA family of (1 — a)-level confidence sets is of the form 


S(x) = {0: x € A(O)}. 


In the case n = 1, c(9)) = 09 log +. and S(x) = [o. aa ; 
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Example 5. Let X,,X2,...,X, be iid U(0,@) RVs. In Problem 9.4.3 we asked the reader 
to show that the test 


_ 1 X(n) > Qo or Xn) S Apal/”, 
(x) = 
0 otherwise 


is UMP size a test of 0 = 0 against 6 0. Then 
A(60) = {x: Ooa'/" < x(n) < Oo} 
and it follows that [X(n) ,X(nyort/ ") is a (1 —a)-level UMA confidence interval for 6. 


The third method we consider is based on Bayesian analysis where we take into account 
any prior knowledge that the experimenter has about 0. This is reflected in the specification 
of the prior distribution 7(@) on ©. Under this setup the claims of probability of coverage 
are based not on the distribution of X but on the conditional distribution of 0 given X = x, 
the posterior distribution of 6. 

Let © be the parameter set, and let the observable RV X have PDF (PMF) f(x). Sup- 
pose that we consider @ as an RV with distribution 7(0) on O. Then fg (x) can be considered 
as the conditional PDF (PMF) of X, given that the RV 6 takes the value 6. Note that we 
are using the same symbol for the RV @ and the value that it assumes. We can determine 
the joint distribution of X and 0, the marginal distribution of X, and also the conditional 
distribution of #, given X = x as usual. Thus the joint distribution is given by 


f(x,9) — ™(0) 0(X), (7) 


and the marginal distribution of X by 


T(O)fo (x if 7 is a PMF, 
a(x) = 9 EO) (8) 
{(0)fo(x)d@ if 7 isa PDF. 
The conditional distribution of 6, given that x is observed, is given by 
7 (0)fo(x) 
h(@|x) = x) >0. (9) 
(@|x)= ae 
Given h(@ | x), it is easy to find functions /(x), u(x) such that 
P{I(X) <0 <u(X)} >1-a, 
where 
“h(O | x) do 
P{i(X) <0 <u(X)|X=x}= Jy es) (10) 
i h(@ | x), 


depending on whether h is a PDF or a PMF. 
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Definition 2. An interval (/(x),u(x)) that has probability at least 1 — a of including 0 is 
called a (1 — a)-level Bayes interval for 0. Also I(x) and u(x) are called the lower and 
upper limits of the interval. 


One can similarly define one-sided Bayes intervals or (1 — a@)-level lower and upper 
Bayes limits. 


Remark 5. We note that, under the Bayesian set-up, we can speak of the probability that 6 
lies in the interval (/(x),u(x)) with probability 1 — a because / and u are computed based 
on the posterior distribution of # given x. In order to emphasize this distinction between 
Bayesian and classical analysis, some authors prefer the term credible sets for Bayesian 
confidence sets. 


Example 6. Let X\,X2,...,X, be iid N(w, 1), 4 € ®, and let the a priori distribution of 
be N(0, 1). Then from Example 8.8.6 we know that h(y | x) is 


ny. 1 
N _ : 
(4.5) 


Thus a (1 — qa)-level Bayesian confidence interval is 


( nx Za/2 nx is Za/2 ) 
n+l Ynti’ntl = Vn 

A (1 —a)-level confidence interval for ju (treating jz as fixed) is a random interval with 
value 


Thus the Bayesian interval is somewhat shorter in length. This is to be expected since we 
assumed more in the Bayesian case. 


Example 7. Let X\,X2,...,X, be iid b(1,p) RVs, and let the prior distribution on 
© = (0,1) be U(0, 1). A simple computation shows that the posterior PDF of p, given x, is 


pi (=p) 
h(p|x) = B( 1X1 n— "xt1)? 


? 


O0<p<l 


otherwise. 


Given a table of incomplete beta integrals and the observed value of 5+} x;, one can 
easily construct a Bayesian confidence interval for p. 


Finally, we consider some large sample methods of constructing confidence intervals. 
Suppose T(X) ~ AN(6,v(0)/n). Then 
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where Z ~ N(0,1). Suppose further that there is a statistic S(X) such that S(X) 4 v(0). 
Then, by Slutsky’s theorem 


and we can obtain an (approximate) (1 — q@)-level confidence interval for @ by inverting 
the inequality 


_T(X)-9 
Mi V5(X) 


Example 8. Let X,,X2,...,X, be iid RVs with finite variance. Also, let EX; = jz and Ex? = 
o? + y*. From the CLT it follows that 


< Za/2° 


X-uLy, 


o/\/n 


where Z ~ N(0,1). Suppose that we want a (1 — a)-level confidence interval for 4. when 


o is not known. Since §—+<, for large n the quantity [,/n(X — 1) /S] is approximately 
normally distributed with mean 0 and variance |. Hence, for large n, we can find constants 
C1,C2 such that 


x= 
Pci < Ht vi<a} =l-a. 
In particular, we can choose —c, = C2 = 2/2 to give 
_ Ss ee. OS 
x— moe eo 
as an approximate (1 — a)-level confidence interval for ju. 


Recall that if 6 is the MLE of 6 and the conditions of Theorem 8.7.4 or 8.7.5 are satisfied 
(caution: See Remark 8.7.4), then 


vntO~8) 4, (0,1) as n —> ov, 
where 
»_ |, fdlogfo(x)|7] _ 1 


Then we can invert the statement 
6-0 
Po {son < ae < ca} 2l-a 


to give an approximate (1 — a)-level confidence interval for 0. 
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Yet another possible procedure has universal applicability and hence can be used 
for large or small samples. Unfortunately, however, this procedure usually yields con- 
fidence intervals that are much too large in length. The method employs the well-known 
Chebychev inequality (see Section 3.4): 


1 
P{|x—EX| < ev/var(x)} Sis 
E 


If 6 is an estimate of @ (not necessarily unbiased) with finite variance (0), then by 


Chebychev’s inequality 
. ~ 1 
{ido < -Ve—o)} >t a 


It follows that 
(4-2V/E0-07,0+ey/E6—69) 


is a [1 — (1/e)]-level confidence interval for 9. Under some mild consistency conditions 
one can replace the normalizing constant \/ [E(4 — 0)?], which will be some function (0) 
of 6, by \(0). ; 

Note that the estimator @ need not have a limiting normal law. 


Example 9. Let X;,X2,...,Xn be iid b(1,p) RVs, and it is required to find a confidence 
interval for p. We know that EX = p and 


var(X) = van) - eam 


It follows that 


e2 


a 1— 1 
Pia <e moO} ‘ 


Since p(1—p) < 4, we have 


= 1 = 1 1 
Ps, X—— =e <p<X+ — =e? >1-5. 
{ Jn ’ "Oa \ e 


One can now choose ¢€ and n or, if n is kept constant at a given number, ¢ to get the 
desired level. 
Actually the confidence interval obtained above can be improved somewhat. We note 


that 
= 1- 1 
Pia <e Pl DY a e 
n é 
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so that 
2 
> e“p(1—p) 1 
Ps \X-p< 1 ; 
{ P| n e2 
Now 
— 2 e 
IX—pl" < —p(1—p) 
if and only if 


20: Pe ; 
(142), (2x+ ox <0. 
nN n 


This last inequality holds if and only if p lies between the two roots of the quadratic 


equation 
2 2 
(1 ES )r (2+! )p+%* =o. 
n n 


OK + (e2/n) — / [2X + (c2/n)] — 41 + (c2/n)]X” 


The two roots are 


Pi= 


I+ (/n)) 
(fn) = y/4(e2/n)X(1-X) + (e4/r?) 
~ 1+@/n) * r+ (Ea) 
and 
2X + (€7/n) + pPX+ (c2/n)|2 —4[1 + (e2/n)|X 
= 2[1 + (€?/n)) 
fn) + y/4(e2/) (1 -X) + (e4/?) 
~ T+/n) [I+ (/n)] 


It follows that 


1 
P{p, <p<p2}> ae 


Note that when n is large 


pexX—e 
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as one should expect in view of the fact that X — p with probability 1 and \/[X(1—X) /n] 


estimates ,/[p(1 —p)/n]. Alternatively, we could have used the CLT (or large-sample 
property of the MLE) to arrive at the same result but with ¢ replaced by Z,/2. 


Example 10. Let X,,X2,...,X, be a sample from U(0,6). We seek a confidence interval 
for the parameter 0. The estimator = Xn) is the MLE of 6, which is also sufficient for 0. 
From Example 5, [X(n) ,aTt/ "Xn)| is a (1 —a)-level UMA confidence interval for 6. 

Let us now apply the method of Chebychev’s inequality to the same problem. We have 


n 
saa, amare 
and 
2 
Eo(X/,) -9)7 = 0?-______. 
(Xm) — 9) (n+1)(n+2) 
Thus 


1-—. 
) 2 e e2 


pHa (n+1)(n+2) <¢ 1 


Since X(n) na 0, we replace 0 by Xn) in the denominator, and, for moderately large n, 


X= 1 2 1 
P | @) | er Mer Le >1-—<. 
X(n) 2 e2 


It follows that 


x EX v2 i 
e) om (n+1)(n+2)’ eres MOT n+1)(n+2) 


is a [1 — (1/e*)]-confidence interval for 9. Choosing 1 — (1/e”) = 1—a, or € = 1/,/a, 
and noting that 1/./[(7+ 1)(n+2)] & 1/n for large n, and the fact that with probability 1, 
X(n) < 9, we can use the approximate confidence interval 


1 /2 
Xn) Xn) ee a 


In the examples given above we see that, for a given confidence interval 1 — a, a wide 
choice of confidence intervals is available. Clearly, the larger the interval, the better the 
chance of trapping a true parameter value. Thus the interval (—oo, +00), which ignores the 
data completely will include the real-valued parameter @ with confidence level 1. How- 
ever, the larger the confidence interval, the less meaningful it is. Therefore, for a given 


for 0. 
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confidence level | — a, it is desirable to choose the shortest possible confidence interval. 
Since the length 6— 6, in general, is a random variable, one can show that a confidence 
interval of level 1 — a with uniformly minimum length among all such intervals does not 
exist in most cases. The alternative, to minimize Eg @- @), is also quite unsatisfactory. 
In the next section we consider the problem of finding shortest-length confidence interval 
based on some suitable statistic. 


PROBLEMS 11.3 


1. 


2. 


9. 


10. 


11. 


A sample of size 25 from a normal population with variance 81 produced a mean of 
81.2. Find a 0.95 level confidence interval for the mean ju. 

Let X be the mean of a random sample of size n from N(j1, 16). Find the smallest 
sample size n such that (X — 1,X + 1) is a 0.90 level confidence interval for ju. 


. Let X1,X2,...,Xm and ¥;, Y2,..., ¥, be independent random samples from N (1,07) 


and N(12,07), respectively. Find a confidence interval for ju; — ju2 at confidence 
level 1 — a when (a) o is known and (b) o is unknown. 


. Two independent samples, each of size 7, from normal populations with common 


unknown variance o* produced sample means 4.8 and 5.4 and sample variances 
8.38 and 7.62, respectively. Find a 0.95 level confidence interval for j4; — jo, the 
difference between the means of samples | and 2. 


. In Problem 3 suppose that the first population has variance ot and the second pop- 


ulation has variance 03, where both oj, and 03 are known. Find a (1 — a)-level 
confidence interval for 4; — 2. What happens if both o7 and 03 are unknown and 
unequal? 


. In Problem 5 find a confidence interval for the ratio 03/07, both when ju, /12 are 


known and when 1, j42 are unknown. What happens if either jz; or js2 is unknown 
but the other is known? 


. Let X1,X2,...,X, be a sample from a G(1, 3) distribution. Find a confidence interval 


for the parameter ( with confidence level 1 — a. 


. (a) Use the large-sample properties of the MLE to construct a (1 — @)-level con- 


fidence interval for the parameter @ in each of the following cases: (i) X1, 
X,...,X, is a sample from G(1,1/0) and (ii) X1,X2,...,X, is a sample 
from P(6). 

(b) In part (a) use Chebychev’s inequality to do the same. 

For a sample of size 1 from the population 


folx) = (0-2), 0<x<8, 


find a (1 — a)-level confidence interval for @. 

Let X,,X2,...,X, be a sample from the uniform distribution on N points. Find an 
upper (1 — a)-level confidence bound for N, based on max(X),X2,...,Xn). 

In Example 10 find the smallest n such that the length of the (1 — a)-level confidence 
interval (X(), aw!/ "X(n)) <d, provided it is known that 6 < a, where a is a known 
constant. 
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12. Let X and Y be independent RVs with PDFs \e~* (x > 0) and pe~”” (y > 0), respec- 
tively. Find a (1 — a)-level confidence region for (A, 4) of the form {(A, 1): AX + 
wY <k}. 

13. Let X|,X2,...,X, be a sample from N(:,07), where o? is known. Find a UMA 
(1 — a)-level upper confidence bound for w. 

14. Let X),X2,...,X, be a sample from a Poisson distribution with unknown param- 
eter \. Assuming that A is a value assumed by a G(a,() RV, find a Bayesian 
confidence interval for X. 

15. Let X),X2,...,X, be a sample from a geometric distribution with parameter 6. 
Assuming that 0 has a priori PDF that is given by the density of a B(a, 3) RV, find 
a Bayesian confidence interval for 6. 

16. Let X;,X2,...,X, be a sample from N(j,1), and suppose that the a priori PDF for 
jis U(—1,1). Find a Bayesian confidence interval for jp. 


11.4 SHORTEST-LENGTH CONFIDENCE INTERVALS 


We have already remarked that we can increase the confidence level by simply taking 
a larger-length confidence interval. Indeed, the worthless interval —co < @ < oo, which 
simply says that 6 is a point on the real line, has confidence level 1. In practice, one would 
like to set the level at a given fixed number | — a (0 < a < 1) and, if possible, construct an 
interval as short in length as possible among all confidence intervals with the same level. 
Such an interval is desirable since it is more informative. We have already remarked that 
shortest-length confidence intervals do not always exist. In this section we will investigate 
the possibility of constructing shortest-length confidence intervals based on simple RVs. 
The discussion here is based on Guenther [37]. Theorem 11.3.1 is really the key to the 
following discussion. 

Let X1,X2,...,X, be a sample from a PDF fo(x), and T(X1,X2,...,Xn,0) = To be a 
pivot for 0. Also, let A; = A1(a@), A2 = A2(@) be chosen so that 


P{A\i < To < A2} =1-a, (1) 
and suppose that (1) can be rewritten as 
P{O(X) <0 <O(X)}=1-a. (2) 


For every Tg, 4; and Az can be chosen in many ways. We would like to choose A, and 
2 so that @— @ is minimum. Such an interval is a (1 — a)-level shortest-length confidence 
interval based on Ty. It may be possible, however, to find another RV Tj that may yield 
an even shorter interval. Therefore we are not asserting that the procedure, if it succeeds, 
will lead to a (1 — a)-level confidence interval that has shortest length among all intervals 
of this level. For Ty we use the simplest RV that is a function of a sufficient statistic and 0. 


Remark 1. An alternative to minimizing the length of the confidence interval is to 
minimize the expected length E9{0(X) — 0(X)}. Unfortunately, this also is quite unsat- 
isfactory since, in general, there does not exist a member of the class of all (1 — @)-level 
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confidence intervals that minimizes Ey {6(X) —6(X)} for all 9. The procedures applied 
in finding the shortest-length confidence interval based on a pivot are also applicable in 
finding an interval that minimizes the expected length. We remark here that the restriction 
to unbiased confidence intervals is natural if we wish to minimize E,{0(X) —(x)}. See 
Section 11.5 for definitions and further details. 


Example 1. Let X,,X2,...,Xn be sample from N(j:,07), where o? is known. Then X is 
sufficient for js and take 


X—u 


~ ofa 


T,,(X) 


Then 


X—p = a = o 
l-a=P < —— <be=Ps5 X—b—~ << X-a-=>. 
. (¢ a } { va oF} 


The length of this confidence interval is (0 /,/n)(b—a). We wish to minimize L = (a /\/n) 
(b—a) such that 


er sae e 
®(b) — ®(a) = — e*Pax= [ x)dx=1-a. 
()- 9a) = | eo 
Here y and ®, respectively, are the PDF and DF of an N(0, 1) RV. Thus 


dL_o (db _, 
da \/n\da 


and 
db 
(6) — ola) =0, 
a 
giving 
dL_ oa {¢(a) 1 
da /n|y(b) J 
The minimum occurs when (a) = (bd), that is, when a = b or a= —b. Since a = b does 


not satisfy 


b 
/ p(t) dt =1—-a, 


we choose a = —b. The shortest confidence interval based on T,, is therefore the equals- 
tails interval, 


= oc. o — oO o 
(+o eK tea/ | or (x Za/27qo® rane: 
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The length of this interval is 2z,, /2(o/,/n). In this case we can plan our experiment to give 
a prescribed confidence level and a prescribed length for the interval. To have level | — a 
and length < 2d, we choose the smallest n such that 


2 


dt >2 2 
= £a/2 or N= la/2 7" 


aa 
This can also be interpreted as follows. If we estimate j: by X, taking a sample of size 
n> rs polo? /d’), we are 100(1 — a) percent confident that the error in our estimate is at 


most d. 


Example 2. In Example 1, suppose that o is unknown. In that case we use 


7,(%) = 2 vn 


as a pivot. T,, has Student’s f-distribution with n — | d.f. Thus 


X—p = S = S 

l-a=P —— be =P4X—b— X—a—>. 
a {a< r Jn< \ { aoe a} 

We wish to minimize 
S 
L=(b—a)— 
(6-a) 

subject to 


b 
[ h-War=1-a, 


where f,—1(t) is the PDF of T,,. We have 


=-(¢ 1) 7 and fab) ~fy-a(a) =0, 


da da Jn 
giving 
dL ee (a) 1 Ss 
da | fn—1(b) vn 
It follows that the minimum occurs at a = —b (the other solution, a = b, is not admissible). 


The shortest-length confidence interval based on T,, is the equal-tails interval, 


a S — S 
(x- n 1a /2Fe% Fin an): 


The length of this interval is 2t,_;./2(S//n), which, being random, may be arbitrarily 
large. Note that the same confidence interval minimizes the expected length of the interval, 
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namely, EL = (b—a)c,(a/./n), where c,, is a constant determined from ES = c,¢ and the 
minimum expected length is 2f,_ 1 4/2Cn(a//7). 


Example 3. Let X;,X2,...,Xn be iid N(ju,07) RVs. Suppose that ju is known and we want 
a confidence interval for 7. The obvious choice for a pivot T,2 is given by 


(Xi — wy? 


T52 (x) = 2 , 


which has a chi-square distribution with n d.f. Now 


so that 


We wish to minimize 


subject to 


a 


where f,, is the PDF of a chi-square RV with n d.f. We have 


dL 1 ldb\< f 
da (= b? T) eH) 
1 


and 


so that 


which vanishes if 
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Numerical results giving values of a and b to four significant places of decimals are 
available (see Tate and Klett [112]). In practice, the simpler equal-tails interval, 


Grae Se), 


P) 


2 : 2 
Xn,o./2 Xn,l—a/2 


may be used. 
If is unknown, we use 


(Xi —X)? 


ps 
Tya(X) = VENT" <1) 


o 


on 


as a pivot. 7,2 has a x7(n— 1) distribution. Proceeding as above, we can show that the 
shortest-length confidence interval based on T,,2 is ((n— 1)(S?/b), (n— 1)(S?/a)); here a 
and b are a solution of 


P{a<x’*(n—1)<b}=1-a 
and 
a’ fu—1(a) = B’'fu-1(b), 


where f,,—1 is the PDF of a y7(n — 1) RV. Numerical solutions due to Tate and Klett [112] 
may be used, but, in practice, the simpler equal-tails confidence interval, 


Ge poe | 
6h Mae 


Example 4. Let X;,X2,...,Xn be asample from U(0,0). Then X,,,) is sufficient for with 
density 


is employed. 


The RV To = X,)/0 has PDF 
jaw. Deere. 


Using To as pivot, we see that the confidence interval is (X(,)/b,X(n)/a) with length 
L=X(,)(1/a—1/b). We minimize L subject to 


b 
/ nt’'dt=b"—a" =1-a. 


a 
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Now 
daa aha 


and 


db _y a gs EO Ye 
£ C1" pa pl pee 


so that the minimum occurs at b = 1. The shortest interval is therefore (X(,), X(n) / al/ we 


Note that 
1 1 nd 1 1 
EL=|——-— ) EX) = oleae) Ib 
a b n+1l1\a b 


which is minimized subject to 
b"—a"=1-a, 


where b = 1 and a = a!/". The expected length of the interval that minimizes EL is 
[((1/a!/”) — 1][n0/(n+1)], which is also the expected length of the shortest confidence 
interval based on X,,). 

Note that the length of the interval (X,,), aX @y) goes to 0 asn > oo. 


For some results on asymptotically shortest-length confidence intervals, we refer the 
reader to Wilks [118, pp. 374-376]. 


PROBLEMS 11.4 


1. Let X),X2,...,X, be a sample from 


e@-*) ifx> 0, 
fo(x) = 


0 otherwise. 


Find the shortest-length confidence interval for at level 1 — a based on a sufficient 
statistic for 0. 

2. Let X\,X2,...,X, be a sample from G(1,0). Find the shortest-length confidence 
interval for 6 at level 1 — a, based on a sufficient statistic for 0. 

3. In Problem 11.3.9 how will you find the shortest-length confidence interval for 6 at 
level 1 — a, based on the statistic X/0? 

4, Let T(X, 0) be a pivot of the form T(X,@) = T;(X) — 0. Show how one can construct 
a confidence interval for 6 with fixed width d and maximum possible confidence 
coefficient. In particular, construct a confidence interval that has fixed width d and 
maximum possible confidence coefficient for the mean jz of a normal population 
with variance 1. Find the smallest size n for which this confidence interval has a 
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confidence coefficient > 1 — a. Repeat the above in sampling from an exponential 
PDF 


f(x) =e" for x > wand f,,(x) =0 for x < p. 


(Desu [21]) 
5. Let X1,Xo,...,X, be a random sample from 


1 
fo(x) = 59 xPt— lal /9}, xER, O>0. 


Find the shortest-length (1 — a)-level confidence interval for 6, based on the 
sufficient statistic )~"_, |X;|. 

6. In Example 4, let R = X(,) — X(1). Find a (1 — a)-level confidence interval for of 
the form (R,R/c). Compare the expected length of this interval to the one computed 
in Example 4. 

7. Let X,,X2,...,X, be a random sample from a Pareto PDF fg(x) = 0/x, x > 0, and 
= 0 for x < 6. Show that the shortest-length confidence interval for # based on X(1) 
is (X(ya!/",X(1)). (Use 0/X(1) as a pivot.) 

8. Let X),X2,...,X, be a sample from PDF fo(x) = 1/(62 — 01), 01 <x < 02, 0) < ™2 
and = 0 otherwise. Let R = X(,) — X(1). Using R/(@2 — 01) as a pivot for estimating 
6 — 6, show that the shortest-length confidence interval is of the form (R,R/c), 
where c is determined from the level as a solution of c’~!{(n—1)c —n} +a =0 
(Ferentinos [27]). 


11.5 UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 


In Section 11.3 we studied test inversion as one of the methods of constructing confidence 

intervals. We showed that UMP tests lead to UMA confidence intervals. In Chapter 9 we 

saw that UMP tests generally do not exist. In such situations we either restrict considera- 

tion to smaller subclasses of tests by requiring that the test functions have some desirable 

properties, or we restrict the class of alternatives to those near the null parameter values. 
In this section will follow a similar approach in constructing confidence intervals. 


Definition 1. A family {S(x)} of confidence sets for a parameter 0 is said to be unbiased 
at confidence level 1 — a if 


Po{S(X) contains 0} > 1—a (1) 
and 
Po{S(X) contains #’}<1—a forall 0, 0° € 0, 046’. (2) 


If $(X) is an interval satisfying (1) and (2), we call it a (1 — @)-level unbiased confidence 
interval. If a family of unbiased confidence sets at level 1 — a is UMA in the class of all 
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(1 — qa)-level unbiased confidence sets, we call it a UMA unbiased (UMAU) family of 
confidence sets at level 1 — a. In other words if S*(x) satisfies (1) and (2) and minimizes 


Po{S(X) contains ’} for 0, 0'€ 0, OAH’ 


among all unbiased families of confidence sets S(X) at level 1 — a, then S*(X) isa UMAU 
family of confidence sets at level 1 — a. 


Remark 1. Definition 1 says that a family S(X) of confidence sets for a parameter @ is 
unbiased at level | — a if the probability of true coverage is at least 1 — a and that of false 
coverage is at most | — a. In other words, S(X) traps a true parameter value more often 
than it does a false one. 


Theorem 1. Let A(9) be the acceptance region of a UMP unbiased size a test of 
Ho(99): 6 = 0 against H, (99): 6 4 Op for each 0. Then S(x) = {@: x €A(0)} isa UMA 
unbiased family of confidence sets at level 1 — a. 


Proof. To see that S(x) is unbiased we note that, since A(@) is the acceptance region of 
an unbiased test, 


Po{S(X) contains 6’} = Pp{X € A(6’)} < 1—a. 


We next show that S(X) is UMA. Let S*(x) be any other unbiased (1 — @)-level family 
of confidence sets, and write A*(0) = {x: S*(x) contains 0}. Then Po{X € A*(0’)} = 
Po {S*(X) contains 6’} < 1 —a, and it follows that A*(@) is the acceptance region of an 
unbiased size a test. Hence 


Po{S*(X) contains 6’} = Po{X € A*(0’)} 
> Po{X € A(O’)} 
= P»{S(X) contains 6’}. 


The inequality follows since A(@) is the acceptance region of a UMP unbiased test. This 
completes the proof. 


Example 1. Let X,,X2,...,X, be a sample from N(p,07) where both jz and o? are 
unknown. For testing Ho: 4 = jo against H,: us F jo, it is known (Ferguson [28, p. 232]) 
that the ¢-test 
1, Lyme Holl 
p(x) = s 
0, otherwise, 


where X = )>x;/n and s* = (n—1)~! S>(x; —X)* is UMP unbiased. We choose c from the 
size requirement 
> c} ; 


= Png {| RH 
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< iro} 


is the acceptance region of a UMP unbiased size a test of Ho: ps = [lo against Hi: uF po. 
By Theorem 1, it follows that 


so that c = t,_|,./2- Thus 


A(uo) = {x: [YE H0) 


S(x) = {ui x € A(u)} 
t—- 14 <p <F4+——t 
= 9X -— MHhh-1,0/2 SMS XT HI 1,00/2 
ve ees is 
is a UMA unbiased family of confidence sets at level 1 — a. 

If the measure of precision of a confidence interval is its expected length, one is natu- 
rally led to a consideration of unbiased confidence intervals. Pratt [81] has shown that the 
expected length of a confidence interval is the average of false coverage probabilities. 
Theorem 2. Let © be an interval on the real line and fg be the PDF of X. Let S(X) 


be a family of (1 — @)-level confidence intervals of finite length, that is, let S(X) = 
(@(X),0(X)), and suppose that 0(X) — 6(X) is (random) finite. Then 


[Ge — 6(x))fo(x)dx = Te Po{S(X) contains 6} d6’ (3) 


forall 9 € O. 


Proof. We have 


Thus for all 9c O 


o 0 
Ey{B(X) —6(X)} = Ea | aa 
0 


o 
-fi fsa} do’ 
0 


= J Potsex) contains 6’} d6’ 


= Po{S(X) contains 6} dé’. 
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Remark 2. If S(X) is a family of UMAU (1 — a)-level confidence intervals, the expected 
length of S(X) is minimal. This follows since the left-hand side of (3) is the expected 
length, if 0 is the true value, of S(X) and Po{S(X) contains 6’} is minimal [because 
S(X) is UMAU], by Theorem 1, with respect to all families of 1 — a unbiased confidence 
intervals uniformly in 0(0 4 0’). 


Since a reasonably complete discussion of UMP unbiased tests (see Section 9.5) is 
beyond the scope of this text, the following procedure for determining unbiased confidence 
intervals is sometimes quite useful (see Guenther [38]). Let X1,X2,...,X, be a sample 
from an absolutely continuous DF with PDF fg(x) and suppose that we seek an unbiased 
confidence interval for 9. Following the discussion in Section 11.4, suppose that 


T(X,,Xo,...,Xn,9) = T(X,0) = To 
is a pivot, and suppose that the statement 
P{Ai (a) < Te < Ax(a)} =1—a 
can be converted to 
Po{O(X) <0 <O(X)} =1-a. 
In order for (0,0) to be unbiased, we must have 
P(0,0') = Po{O(X) < & < O(X)}=1-a te? =o (4) 
and 
P(6,0')<1l—a ie 20. (5) 


If P(6,6’) depends only on a function y of 0,0’, we may write 


ime ap ad 
P 6 
mfeie if O40, - 


and it follows that P(y) has a maximum at 6’ = 0. 


Example 2. Let X\,X2,...,X, be iid N(p,07) RVs, and suppose that we desire an 
unbiased confidence interval for 02. Then 


_1)92 
T(X,07) = aes =T, 
a 


has a y(n — 1) distribution, and we have 


s2 
P{n <(n-1)53 <b =1-a, 
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so that 
2, é S2 
P{(n- 1) <0? < (n=) =1-0 
2 MI 
Then 
2 2 
(02,0) = Perf (0-1) 5 <0" < (n= 1) =} 
A2 1 
wip aye 
re dee 


where y = 0!” /o? and Tz ~ x2(n— 1). Thus 


P(y) = P{A\1y <T, < Ary}. 


Then 

hee eee 
and 

Ply) <1 as 
Thus we need Aj, A2 such that 

PAj=1=0 o 
and 

a = = Nofn—1(A2) — Aifn—1(A1) = 9, (8) 


where f,_; 1s the PDF of 7. Equations (7) and (8) have been solved numerically for 
1,2 by several authors (see, for example, Tate and Klett [112]). Having obtained 1, A2 
from (7) and (8), we have as the unbiased (1 — a)-level confidence interval 


(pS m-yF), ) 


Note that in this case the shortest-length confidence interval (based on T,,) derived in 
Example 11.4.3, the usual equal-tails confidence interval, and (9) are all different. The 
length of the confidence interval (9), however, can be considerably greater than that of the 
shortest interval of Example 11.4.3. For large n all three sets of intervals are approximately 
the same. 


Finally, let us briefly investigate how invariance considerations apply to confidence 
estimation. Let X = (X1,X2,...,Xn) ~fo,9€ O CR. Let S be a group of transformations 
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on X which leaves P = {fg: 6 € O} invariant. Let S(X) be a (1 — a)-level confidence 
set for 0. 


Definition 2. Let P be invariant under G and let S(x) be a confidence set for 0. Then S is 
equivariant under G, if for every x € X,9 € Oandg EG 


S(x) € 0 = S(g(x)) 3 20. (10) 
Example 3. Let X,,X2,...,X, be a sample from PDF 
folx) =exp{-(x-0)}, x>0 


and = 0 if x < 6. Let G = {{a, 1}: a € R}, where {a,1}x = (x1 +a,x2 +4,...,%n +4) 
and G induces § = S§ on 0 = &. The family {f,} remains invariant under S. Consider a 
confidence interval of the form 


where c;, C2 are constants. Then 
S({a, 1}x) = {0: ¥+a—c, <0 <X+a-— cy}. 


Clearly, 


S(x) 30 Xta—c) <0+a<X+a-—cC— 
<=> S({a, 1}x) 5 20 


and it follows that (x) is an equivariant confidence interval. 


The most useful method of constructing invariant confidence intervals is test inversion. 
Inverting the acceptance region of invariant tests often leads to equivariant confidence 
intervals under certain conditions. Recall that a group § of transformations leaves a 
hypothesis testing problem invariant if G leaves both Oo and ©, invariant. For each 
Ho : 8 = 9, A € O, we have a different group of transformations, S9,, which leaves the 
problem of testing § = p invariant. The equivariant confidence interval, on the other hand, 
must be equivariant with respect to 5, which is a much larger group since § D Go, for 
all 69. The relationship between an equivariant confidence set and invariant tests is more 
complicated when the family P has a nuisance parameter T. 

Under certain conditions there is a relationship between equivariant confidence sets 
and associated invariant tests. Rather than pursue this relationship, we refer the reader to 
Ferguson [28, p. 262]; it is generally easy to check that (10) holds for a given confidence 
interval S to show that S is invariant. The following example illustrates this point. 
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Example 4. Let X,,X2,...,X, be iid N(j1,07) RVs where both pz and o? are unknown. 
In Example 9.5.3 we showed that the test 


1 if S~” xj —X)? < of x2_ = 
w00)=| Tt -¥? S$ FXA 1 


0 otherwise 


is UMP invariant, under translation group for testing Ho : 0? > 0 against H; : 0? < a9. 
Then the acceptance region of ¢ is 


A(x) = (» : Soi - 3)? > ia} : 


1 


Clearly, 

x € A(x) => 09 < (n= 1)8°/Xp—-11-0 
and it follows that 

S(x) = {a? io< (n— Ds iat 


is a (1 — a)-level confidence interval (upper confidence bound) for a”. We show that S is 
invariant with respect to the scale group. In fact 


S({0,c}x) = fa? ie e(n— AS Gh a 
and 


ee = la ane <> $({0,c}x) 3 0” = {0,c}o? 


and it follows that S(x) is an equivariant confidence interval for 0”. 


PROBLEMS 11.5 


1. Let X1,X2,...,X, be asample from U(0, 6). Show that the unbiased confidence inter- 
vals for 6 based on the pivot max X;/0 coincides with the shortest-length confidence 
interval based on the same pivot. 

2. Let X1,X2,...,X, be a sample from G(1,@). Find the unbiased confidence interval 
for 6 based on the pivot 27, X;/0. 

3. Let X,,X2,...,X, be a sample from PDF 


e@-9) ifx>@ 
0 otherwise. 


Find the unbiased confidence interval based on the pivot 2n[min X;— 6]. 


4. Let X),X2,...,X, be iid N(j1,07) RVs where both jy and o? are unknown. Using 
the pivot T,,,, = \/n(X — 1) /S show that the shortest-length unbiased (1 — a)-level 


confidence interval for is the equal-tails interval (X — t,-\,4/25/ Jn,X + 


tn—1,0/28//n). 


530 CONFIDENCE ESTIMATION 


5. Let X,,X2,...,X, be iid with PDF f(x) = 0/x?, x > 0, and = 0 otherwise. Find the 
shortest-length (1 — a)-level unbiased confidence interval for 6 based on the pivot 

6. Let X1,X2,...,X, be a random sample from a location family P = {fo(x) = 
f(x—0);0 € R}. Show that a confidence interval of the form S(x) = {0 : T(x) —c1 < 
0 < T(x) +c2} where T(x) is an equivariant estimate under location group is an 
equivariant confidence interval. 

7. Let X|,X2,...,X, be iid RVs with common scale PDF f,(x) = +f(x/o), o > 0. 
Consider the scale group § = {{0,b} : b > O}. If T(x) is an equivariant estimate 
of o, show that a confidence interval of the form 


x(x) = {ois  <eh 


is equivariant. 
8. Let X),X2,...,X, be iid RVs with PDF fg(x) = exp{—(x— 0)}, x > 0 and, = 0, 
otherwise. For testing Ho : 0 = @ against H, : 0 > 6, consider the (UMP) test 


o(x) =1, if Xq) => A —(&na)/n, =0, otherwise. 


Is the acceptance region of this a-level test an equivariant (1 — a)-level confidence 
interval (lower bound) for 6 with respect to the location group? 


11.66 RESAMPLING: BOOTSTRAP METHOD 


In many applications of statistical inference the investigator has a random sample from a 
population distribution DF F which may or may not be completely specified. Indeed the 
empirical data may not even fit any known distribution. The inference is typically based 
on some statistic such as X, S*, a percentile or some much more complicated statistic 
such as sample correlation coefficient or odds ratio. For this purpose we need to know the 
distribution of the statistic being used and/or its moments. Except for the simple situations 
such as those described in Chapter 6 this is not easy. And even if we are able to get a handle 
on it, it may be inconvenient to deal with it. Often, when the sample is large enough, one 
can resort to asymptotic approximations considered in Chapter 7. Alternatively, one can 
use computer-intensive techniques which have become quite popular in the last 25 years 
due to the availability of fast home or office laptops and desktops. 

Suppose x1,x2,...,X, iS arandom sample from a distribution F with unknown param- 
eter 6(F), and let 6 be an estimate of 9(F). What is the bias of 6 and its SE? Resampling 
refers to sampling from x;,x2,...,x, and using these samples to estimate the statistical 
properties of 6. Jackknife is one such method where one uses subsets of the sample by 
excluding one or more observations at a time. For each of these subsamples an estimate 6j 
of # is computed, and these estimates are then used to investigate the statistical properties 
of 0. 

The most commonly used resampling method is the bootstrap, introduced by 
Efron [22], where one draws random samples of size n, with replacement, from 
X1,X2,.-.,X,. This allows us to generate a large number of bootstrap samples and hence 
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bootstrap estimates 6, of 9. This bootstrap distribution of 6, is then used to study the 
statistical properties of 6. 

Let X;,,Xpo,---,Xp,-0 = 1,2,...,B, be iid RVs with common DF F’,, the empirical DF 
corresponding to the sample x),x2,...,%,. Then (X;,,Xj,--.,X;,) is called a bootstrap 
sample. Let 6 be the parameter of interest associated with DF F and suppose we have 
chosen 6 to be an estimate of 6 based on the sample x,,%2,...,X,. For each bootstrap 
sample let 6,,b = 1,2,...,B, be the corresponding bootstrap estimate of 6. We can now 
study the statistical properties of 6 based on the distribution of the 6,, b = 12429 Bs 
values. Let 9* = 0? 0, /B. Then the variance of 6 is estimated by the bootstrap variance. 


; ‘ 1 a 
vatps(8) = var(Op) = =— Th (a ) (1) 


Similarly the bias of 6, b(9) = E(6) - 0, is estimated by 
biasp;(0) =0 —6. (2) 


Arranging the values of 6,, b= 1,2,...,B, in increasing order of magnitude and then 
excluding 100a/2 percent smallest and largest values we get a (1 — a)-level confidence 
interval for 0. This is the so-called percentile confidence interval. One can also use this 
confidence interval to test hypotheses concerning 0. 


Example 1. For this example we took a random sample of size 20 from a distribution on 
(.25, 1.25) with following results. 


0.75 049 1.14 0.79 059 1.14 L117 042 0.57 1.05 
0.31 046 0.73 0.32 0.81 045 0.56 0.42 0.66 0.63 


Suppose we wish to estimate the mean @ of the population distribution. For the sake of 
this illustration we use 6 = X and use the bootstrap to estimate the SE of 6. 

We took 1000 random samples, with replacement, of size 20 each from this sample 
with the following distribution of by. 


Interval Frequency 
0.49-0.56 6 
0.53-0.57 29 
0.57-0.61 109 
0.61-0.65 200 
0.65-0.69 234 
0.69-0.73 229 
0.73-0.77. 123 
0.77-0.81 59 
0.81-0.85 10 


0.85-0.89 2 
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The bootstrap estimate of 0 is 6* =0.677 and that of the variance is 0.061. By excluding 

the smallest and the largest twenty-five vales of 6, a 95 percent confidence interval for 6 

is given by (0.564, 0.793). (We note that x = 0.673 and s* = 0.273 so the SE(x) = .061.) 
Figure | show the frequency distribution of the bootstrap statistic by. 


It is natural to ask how well does the distribution of the bootstrap statistic 6, approxi- 
mate the distribution of 6? The bootstrap approximation is often better when applied to the 
appropriately centered 6. Thus to estimate population mean @ bootstrap is applied to the 
centered sample mean x — 9. The corresponding bootstrapped version will then be x, — x, 
where X, is the sample mean of the bth bootstrap sample. Similarly if 6= Z1/2 =med(X1, 
X,..., Xy) then the bootstrap approximation will be applied to the centered sample median 
Z1/2—F~ '(0.5). The bootstrap version will be then be med(X},, Xj5,---, Xf,) — Z,/2- Sim- 
ilarly , in estimation of the distribution of sample variance S”, the bootstrap version will 
be applied to the ratio S*/a?, where 7 is the variance of the DF F. 

We have already considered the percentile method of constructing confidence intervals. 
Let us denote the ath percentile of the distribution of Op, b= 1, 2,...,B, by B,. Suppose that 
the sampling distribution of 6-6 is approximated by the bootstrap distribution of 6, —0. 
Then the probability that 6 — 6 is covered by the interval (By 2 — 6, By_aja+ 6) is approx- 
imately (1 — a). This is called a (1 — a)-level centered bootstrap percentile confidence 
interval for 6. 

Recall that in sampling from a normal distribution when both mean and the variance 
are unknown, a (1 — a@)-level confidence interval for the mean @ is based on t-statistic and 
is given by (¥— f,—-1,a/2,% + tr—1,0./2). For nonnormal distributions the bootstrap analog 
of the Student’s t-statistic is the statistic (6 — 0)/(6/./n). The bootstrap version is the 
statistic T, = (6, _ 6) /SE,, where SE, is the SE computed from the bootstrap sample 
distribution. A(1 — a)-level confidence interval is now easily constructed. 


250 74 


200 


150 


100 
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In our discussion above we have assumed that F(@) is completely unspecified. What if 
we know F except for the parameter 6? In that case we take bootstrap samples from the 
distribution F(4). 

We refer the reader to Efron and Tibshirani [23] for further details. 


PROBLEMS 11.6 


1. (a) Show that there are (=) distinct bootstrap samples of size n. [Hint: Problem 
1.4.17.] 
(b) What is the probability that a bootstrap sample is identical to the original 
samples? 


(c) What is the most likely bootstrap sample to be drawn? 
(d) What is the mean number of times that x; appears in the bootstrap samples? 

2. Let x1, X2,...,%, be a random sample. Then 2 = X is an estimate of the unknown 
mean pi. Consider the leave-one-out Jackknife sample. Let jz; be the mean of the 
remaining (n — 1) observations when x; is excluded: 

(a) Show that x; =n fi— (n— 1) jij. 

(b) Now suppose we need to estimate a parameter 0 and choose 6 to be an estimate 
from the sample. Imitating the Jackknife procedure for estimating /4 we note that 
0* = nO —(n—1)6;. What is the Jackknife estimate of 9? What is the Jackknife 
estimate of the bias of 6 and its variance? 


3. Let x), x2,...,%, be a random sample from (0,1) and suppose that X is an estimate 
of 0. Let X}, X3,....X% be a bootstrap sample from N(x,1). Show that both xX-—0 
and X —x have the same N(0,1/n) distribution. 


4. Consider the data set 
Pitan Fe Pits 2 
Let x}, x5, x3, x7 be a bootstrap sample from this data set: 
(a) Find the probability that the bootstrap mean equals 2. 
(b) Find the probability that the maximum value of the bootstrap mean is 9. 
(c) Find the probability that the bootstrap sample mean is 4. 
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GENERAL LINEAR HYPOTHESIS 


12.1 INTRODUCTION 


This chapter deals with the general linear hypothesis. In a wide variety of problems the 
experimenter is interested in making inferences about a vector parameter. For example, 
he may wish to estimate the mean of a multivariate normal or to test some hypotheses 
concerning the mean vector. The problem of estimation can be solved, for example, by 
resorting to the method of maximum likelihood estimation, discussed in Section 8.7. In this 
chapter we restrict ourselves to the so-called linear model problems and concern ourselves 
mainly with problems of hypotheses testing. 

In Section 12.2 we formally describe the general model and derive a test in complete 
generality. In the next four sections we demonstrate the power of this test by solving 
four important testing problems. We will need a considerable amount of linear algebra 
in Section 12.2. 


12.2) GENERAL LINEAR HYPOTHESIS 
A wide variety of problems of hypotheses testing can be treated under a general setup. In 
this section we state the general problem and derive the test statistic and its distribution. 


Consider the following examples. 


Example 1. Let Y,,Y2,..., Y, be independent RVs with EY; = y;,i=1,2,...,k, and com- 
mon variance o”. Also, n; observations are taken on Y;,i = 1,2,...,k, and ~ nj =n. 
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It is required to test Hp : 4; = U2 = ++: = fy. The case k = 2 has already been treated 
in Section 10.4. Problems of this nature arise quite naturally, for example, in agricultural 
experiments where one is interested in comparing the average yield when k fertilizers are 
available. 


Example 2. An experimenter observes the velocity of a particle moving along a line. He 

takes observations at given times f,,f,...,f,. Let 3, be the initial velocity of the particle 

and (2 the acceleration; then the velocity at time fis given by y = 3, + G2t+, where ¢ is an 

RV that is nonobservable (like an error in measurement). In practice, the experimenter does 

not know {, and (3 and has to use the random observations Y,, Y2,..., Y, made at times 

t),to,..-,t,, respectively, to obtain some information about the unknown parameters (3, (>. 
A similar example is the case when the relation between y and ¢ is governed by 


y= Bot itt Pot’ +e, 


where f is a mathematical variable, (o, 3,, 82 are unknown parameters, and ¢€ is a nonob- 
servable RV. The experimenter takes observations Y),Y2,..., Y, at predetermined values 
ti ,t2,...,tn, respectively, and is interested in testing the hypothesis that the relation is in 
fact linear, that is, 6. = 0. 


Examples of the type discussed above and their much more complicated variants can 
all be treated under a general setup. To fix ideas, let us first make the following definition. 


Definition 1. Let Y = (Y,,¥5,...,Y,,)’ be a random column vector and X be ann x k 
matrix, k <n, of known constants xj,i = 1,2,...,n; j = 1,2,...,k. We say that the 
distribution of Y satisfies a linear model if 


EY = Xf, (1) 


where 3 = (8), 32,...,8,)/ is a vector of unknown (scalar) parameters (31, 32, ..., 3x. It is 
convenient to write 


Y=X£6G-+e, (2) 


where € = (€1,€2,...,€n)/ is a vector of nonobservable RVs with Ee; = 0, j = 1,2,...,n. 
Relation (2) is known as a linear model. Then general linear hypothesis concerns 3, 
namely, that @ satisfies Hp: HG = 0, where H is a known r x k matrix with r < k. 


In what follows we will assume that ¢),€2,...,€, are independent, normal RVs with 


common variance a” and Ee; =0,j =1,2,...,n. In view of (2), it follows that Y;, Y2,..., Yn 
are independent normal RVs with 


k 
EY,=) aj and. wa(¥y=o, 12 1,2,..57: (3) 
j=l 


We will assume that H is a matrix of full rank r,r < k, and X is a matrix of full rank k <n. 
Some remarks are in order. 
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Remark 1. Clearly, Y satisfies a linear model if the vector of means EY =(EY,, 
EY>,...,EY,)' lies in a k-dimensional subspace generated by the linearly independent 
column vectors x|,X2,...,x, of the matrix X. Indeed, (1) states that E'Y is a linear combi- 
nation of the known vectors x;,...,X,. The general linear hypothesis Hy: HG = 0 states 
that the parameters 6, 62,..., 5, satisfy r independent homogeneous linear restrictions. It 
follows that, under Ho, FY lies in a (k—r)-dimensional subspace of the k-space generated 
by X],...,Xx. 


Remark 2. The assumption of normality, which is conventional, is made to compute the 
likelihood ratio test statistic of Ho and its distribution. If the problem is to estimate G, no 
such assumption is needed. One can use the principle of least squares and estimate 3 by 
minimizing the sum of squares, 


n 


Sve; =ee' = (Y-X)'(Y-X). (4) 


i=1 


The minimizing value 3(y) is known as a least square estimate of (3. This is not a difficult 
problem, and we will not discuss it here in any detail but will mention only that any solution 
of the so-called normal equations 


X'XB=X’Y (5) 


is a least square estimator. If the rank of X is k(< n), then X’X, which has the same rank 
as X, is a nonsingular matrix that can be inverted to give a unique least square estimator 


pax ky. (6) 


If the rank of X is < k, then X’X is singular and the normal equations do not have a 
unique solution. One can show, for example, that B is unbiased for 3, and if the Y;’s 
are uncorrelated with common variance o”, the variance—covariance matrix of the B;’s is 
given by 


E{ (6-8) (a-s) | =e (x. (7) 


Remark 3. One can similarly compute the so-called restricted least square estimator of 3 
by the usual method of Lagrange multipliers. For example, under Hp: HG = 0 one simply 
minimizes (Y — X)' (Y — X{) subject to HG = 0 to get the restricted least square 
estimator B. The important point is that, if ¢ is assumed to be a multivariate normal RV 
with mean vector O and dispersion matrix a’ I, the MLE of @ is the same as the least 
square estimator. In fact, one can show that B; is the UMVUE of 6;,i= 1,2,...,k, by the 
usual methods. 


Example 3. Suppose that a random variable Y is linearly related to a mathematical vari- 
able x that is not random (see Example 2). Let Y,,Y2,...,Y, be observations made at 
different known values x1,x2,...,x, of x. For example, x1,%2,...,X, may represent dif- 
ferent levels of fertilizer, and Y\,Y2,...,Y,, respectively, the corresponding yields of 
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acrop. Also, €),€2,.-.,€n represent unobservable RVs that may be errors of measurements. 
Then 


Yi = Bot Pixit éi, P=) neg mi, 


and we wish to test whether 3, = 0, that the fertilizer levels do not affect the yield. Here 


1 X| 

1 X2 
X= ; 

1 xy 


B= (60, f1)’, and € = (€1,€2,.--,€n)’. 


The hypothesis to be tested is Hp: 8; = 0 so that, with H = (0, 1), the null hypothesis can 
be written as Hp: HG = 0. This is a problem of linear regression. 
Similarly, we may assume that the regression of Y on x is quadratic: 


Y= fo+fixt+ fox te, 


and we may wish to test that a linear function will be sufficient to describe the relationship, 
that is, G2 = 0. Here X is the n x 3 matrix 


1 x rai 

1 x + 
xX — 

1 x, x» 


n 


B= (Bo, Bi, Ba)’, € = (€1,€2,..-,€n)’, 


and H is the 1 x 3 matrix (0,0, 1). 
In another example of regression, the Y’s can be written as 


Y = Bix, + BoXx2 + 83x3 +e, 
and we wish to test the hypothesis that 6; = 3 = (3. In this case, X is the matrix 
X11 X12, X13 


X21. X22, X23 


x = 


Xni  Xn2  Xn3 


and H may be chosen to be the 2 x 3 matrix 
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Example 4. Another important example of the general linear hypothesis involves the 
analysis of variance. We have already derived tests of hypotheses regarding the equal- 
ity of the means of two normal populations when the variances are equal. In practice, one 
is frequently interested in the equality of several means when the variances are the same, 
that is, one has k samples from N(j11,07),...,.N(jux,07), where o is unknown and one 
wants to test Ho: 4) = [2 = ++: = pe (see Example 1). Such a situation is of common 
occurrence in agricultural experiments. Suppose that & treatments are applied to experi- 
mental units (plots), the ith treatment is applied to n; randomly chosen units, i= 1,2,...,k, 
~ , 1; =n, and the observation yj; represents some numerical characteristic (yield) of the 
jth experimental unit under the ith treatment. Suppose also that 


Yij = pit Ey, JHA ee P12, aaa ky 


where ¢; are iid N(0,07) RVs. We are interested in testing Ho: pu) = 2 = +++ = fix. We 
write 

Y = (V1, ¥i2,---5 Vins Yor; Y22, +++) Vanes +++ Vins Vins ees Vig)’ 

B= (Ha, Hays Me), 


1, ‘O. a “Oo 
a 0 
X= ; 
OG OG a 4, 
where 1,, = (1,1,...,1)/ is the n;-vector (i = 1,2,...,k), each of whose elements is unity. 


Thus X is n x k. We can choose 


1 -1 O +: 0 
1 0 -1 0 
H= 
1 0 O -1 
so that Ho: 41 = Wo = ++: = Lx is of the form HG = 0. Here H is a (k— 1) x k matrix. 


The model described in this example is frequently referred to as a one-way analysis of 
variance model. This is a very simple example of an analysis of variance model. Note that 
the matrix X is of a very special type, namely, the elements of X are either 0 or 1. X is 
known as a design matrix. 


Returning to our general model 
Y=XG+e, 
we wish to test the null hypothesis Hj: HG = 0. We will compute the likelihood ratio 


test and the distribution of the test statistic. In order to do so, we assume that € has a 
multivariate normal distribution with mean vector 0 and variance—covariance matrix o7I,,, 
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where o? is unknown and I, is then xn identity matrix. This means that Y has an n-variate 
normal distribution with mean X{ and dispersion matrix o7I,, for some 3 and some o?, 
both unknown. Here the parameter space @ is the set of (k + 1)-tuples ((’,a7) = (31, Ao, 


...; 0,07), and the joint PDF of the X’s is given by 
Fajo2 (V1.2) +++ 5 Yn) (8) 
1 1 n 
. (27r)"/2qn of IG2 do: Bixa—*-° bea} 
i=l 


= Pa ex al¥ xay(v—x~)}. 


Theorem 1. Consider the linear model 


Y=X£G+e, 
where X is an n x k matrix ((xj)), i= 1,2,...,2, 7 = 1,2,...,k, of known constants 
and full rank k <n, is a vector of unknown parameters 3), 62,..., 8, and € = (€1,€2, 


..+;€n) is a vector of nonobservable independent normal RVs with common variance o 


and mean Ee = 0. The likelihood ratio test for testing the linear hypothesis Hp: HG = 0, 
where H is an r x k matrix of full rank r < k, is to reject Ho at level a if F > F.,, where 
Px {F > Fa} =a, and F is the RV given by 


py © XB) (¥ XB) - (¥-XB)(¥- xB) i 
(¥ = XB) (¥-XA) 


In (9), B, B are the MLE’s of 3 under © and Qo, respectively. Moreover, the RV [(n — 
k)/r|F has an F-distribution with (r,n —k) d.f. under Ho. 


Proof. The likelihood ratio test of Hy: H = 0 is to reject Hp if and only if A(y) <c, 
where 


XM _ SUPgco,S 8,02 (y) 


, 10 
supgcolp,o2(Y) oe 


(8',07)', and Op = {(8',02)’: HB = 0}. Let 6 = (3, 62)’ be the MLE of 6’ € 0, 
and 0= (@',67)! be the MLE of 6 under Ho, that is, when HG = 0. It is easily seen that 
is the value of @ that minimizes (y — XB)/(y — X), and 


& =n"'(y-XB)'(y-X8). (11) 
Similarly, A is the value of 9 that minimizes (y —X)'(y — XB) subject to HG = 0, and 


6? =n“ (y XB)! (y — XA). (12) 
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It follows that 
52 n/2 
xy)=(S) (13) 
a 


The critical region \(y) < c is equivalent to the region {\(y)}~7/" < {c}~?/", which is 
of the form 


52 > Cl. (14) 
This may be written as 
Aer A) a. (15) 
(y— XB)'(y— XB) 


or, equivalently, as 


(y - XB) (y— XB) —(y-XB)'(y- XB) 
(y — X)'(y — XB) 


It remains to determine the distribution of the test statistic. For this purpose it is 
convenient to reduce the problem to the canonical form. Let V,, be the vector space of 
the observation vector Y,V; be the subspace of V, generated by the column vectors 
X1,X2,-..-,X, of X, and Vi_, be the subspace of V; in which EY is postulated to lie 
under Ho. We change variables from Y,,Y2,...,Y, to Z;,Z2,...,Z,, where Z|,Z2,...,Zy 
are independent normal RVs with common variance a” and means EZ; = 6;,i= 1,2,...,k, 
EZ, = 0,i=k-+1,...,n. This is done as follows. Let us choose an orthonormal basis 
of k—r column vectors {a;} for Vi_,, say {Q,41,@;42,---,@x}. We extend this to an 
orthonormal basis {a@),@2,...,@,,Q41,-.-,@x} for Vz, and then extend once again to 
an orthonormal basis {a;,Q2,...,@x, @x+1,---;Qn} for V,. This is always possible. 

Let z1,Z2,...,Z, be the coordinates of y relative to the basis {a1,@2,...,a@,}. Then z; = 
ay and z = PY, where P is an orthogonal matrix with ith row a}. Thus EZ; = Ea Y = 
aX, and EZ = PX{. Since XG € V; (Remark 1), it follows that ai XG = 0 fori > k. 
Similarly, under Ho, XG € Vi_, C Vz, so that a} XB = 0 fori <r. Let us write w = PX. 
Then wry) = wWep2 = +++ = WwW, = 0, and under Hp,w; =w2 =--: =w, = 0. Finally, from 
Corollary 2 of Theorem 6 it follows that Z| ,Z2,...,Z,, are independent normal RVs with 
the same variance o? and EZ; = w;, i= 1,2,...,n. We have thus transformed the problem 
to the following simpler canonical form: 


~ cy 1. (16) 


Q:  Z are independent N(w;,07), i=1,2,...,n, 


Whi] = Wey = =u, = 0, (17) 


Ho: wy =u. =-:-=w,=0. 


Now 


(y — XB)'(y — XB) = (P’z— P’w)'(P’z— P’w) (18) 
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The quantity (y — X3)’(y — X) is minimized if we choose &; = z;, i= 1,2,...,k, so 
that 


(y—XB)'(y S557 a (19) 
i=k+1 
Under Ho,w, = w2 = ++: = w, = 0, so that (y — X)'(y — XP) will be minimized if 
we choose w; = z;,i=r+1,...,k. Thus 
(y—x,)'( =b4 + SoZ (20) 
i=k+1 


It follows that 


F= eas 1 Z} : 
Tee 


Now 907,412; /07 has a x?(n—k) pasos and, under Ho, )>,_, Z?/07 has a x”(r) 
distribution. Since }>;_, Z? and 5>;_,.,,Z7 are independent, we see that [(n— k)/r]F is 
distributed as F(r,n — i) under Ho, as asserted. This completes the proof of the theorem. 


Remark 4. In practice, one does not need to find a transformation that reduces the problem 
to the canonical form. As will be done in the following sections, one simply computes the 


estimators 6 and @ and then computes the test statistic in any of the equivalent forms (14), 
(15), or (16) to apply the F-test. 


Remark 5. The computation of B, B is greatly facilitated, in view of Remark 3, by using 
the principle of least squares. Indeed, this was done in the proof of Theorem 1 when we 
reduced the problem of maximum likelihood estimation to that of minimization of sum of 


squares (y — X)'(y — Xf). 


Remark 6. The distribution of the test statistic under H; is easily determined. We note 
that Z;/o ~ N(w;/o,1) for i= 1,2,...,r, so that )>;_, Z?/o7 has a noncentral chi-square 
distribution with r d.f. and noncentrality parameter 6 = )77_,w?/o7. It follows that 
[((1—k)/r]F has a noncentral F-distribution with d.f. (r,n —k) and noncentrality param- 
eter 0. Under Ho,6 = 0, so that [(n — k)/r]F has a central F(r,n — k) distribution. Since 
dy 0? = >) (EZ;)’, it follows from (19) and (20) that if we replace each observation 
Y; by its expected value in the numerator of (16), we get 076. 


Remark 7. The general linear hypothesis makes use of the assumption of common vari- 
ance. For instance, in Example 4, Yj ~ N(u;,07), j = 1,2,...,k. Let us suppose that 
Yi ~ N(ui,07), i= 1,2,...,k. Then we need to test that 0; = 02 = --- = ox before we 
can apply Theorem 1. The case k = 2 has already been considered in Section 10.3. For the 
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case where k > 2 one can show that a UMP unbiased test does not exist. A large-sample 
approximation is described in Lehmann [64, pp. 376-377]. It is beyond the scope of this 
book to consider the effects of departures from the underlying assumptions. We refer the 
reader to Scheffé [101, Chapter 10], for a discussion of this topic. 


Remark 8. The general linear model (GLM) is widely used in social sciences where Y 
is often referred to as the response (or dependent) variable and X as the explanatory (or 
independent) variable. In this language the GLM “predicts” a response variable from a 
linear combination of one or more explanatory variables. It should be noted that dependent 
and independent in this context do not have the same meaning as in Chapter 4. Moreover, 
dependence does not imply causality. 


PROBLEMS 12.2 


1. Show that any solution of the normal equations (5) minimizes the sum of squares 
(Y — X)'(Y — Xp). 

2. Show that the least square estimator given in (6) is an unbiased estimator of (. If the 
RVs Y; are uncorrelated with common variance 0”, show that the covariance matrix 
of the Bis is given by (7). 

3. Under the assumption that ¢ [in model (2)] has a multivariate normal distribution 
with mean 0 and dispersion matrix o7/,, show that the least square estimators and 
the MLE’s of @ coincide. 

4. Prove statements (11) and (12). 

5. Determine the expression for the least squares estimator of 3 subject to HG = 0 


12.3. REGRESSION ANALYSIS 


In this section we study regression analysis, which is a tool to investigate the interrela- 
tionship between two or more variables. Typically, in its simplest form a response random 
variable Y is hypothesized to be related to one or more explanatory nonrandom vari- 
ables x;’s. Regression analysis with a single explanatory RV is known as simple regression 
and if, in addition, the relationship is thought to be linear, it is called simple linear regres- 
sion (Example 12.2.3). In the case where several explanatory variables x;’s are involved 
the regression is referred to as multiple linear regression. Regression analysis is widely 
used in forecasting and prediction. Again this is a special case of GLM. 

This section is divided into three subsections. The first subsection deals with multiple 
linear regression where the RV Y is of the continuous type. In the next two subsections we 
study the case when Y is either Bernoulli or a count variable. 


12.3.1 Multiple Linear Regression 


It is convenient to write GLM in the form 


Y = 61,+XB +e, (1) 
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where Y,X,¢, and @ are as in Equation (12.2.1), and 1, is the column n x 1 unit vector 
(1,1,...,1). The parameter (po is usually referred to as the intercept whereas @ is known 
as the slope vector with k parameters. The least estimator (LSE) of (o and G are easily 
obtained by minimizing. 


n 


9) ‘ 
S" (i- fo -¥/8) > Xi = (xi1,%i25 «+5 Xik) = 1,2 cag, (2) 


i=] 


resulting in k+ 1 normal equations 


Sex = ) (8-2) 5) 2)'8y = (1-3); 3) 


Su =Sy 
or 
B = S84 and Bo =y- px 
E(5o) = Bo, E(B) = 6 (4) 
and 
. at of fl+nx Si'x nxS,,' 
CHOI a ae age (5) 
An unbiased estimate of a7 is given by 
1 » a\! : F 
= (¥-Go1,—X8) (Y-foln— XB). 6) 
n—k—-1 
Let us now consider the simple linear regression model 
¥y = Boln+X6 +e. (7) 


The LSEs of (6, 3)’ is given by 
Bo y— Bx 
(4 = | SGi-x)y: (8) 
Qi-x)? 


6? = > (n-¥-Ali-2) (9) 


and 
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The covariance matrix is given by 
B 2 nx nx 
0 ao {lt+n= —® 
Cov Be =— _m ee (10) 
B n _ mx a]? 
Sn Sn 


where s? = )~7_, (x; —X) 
Let us now verify these results using the maximum likelihood method 
Clearly, Y;,Y2,...,¥, are independent normal RVs with EY; = (o + 6\x; and 
var(Y;) = 07, i=1,2,...,n, and Y is an n-variate normal random vector with mean X3 


and variance o7I,. The joint PDF of Y is given by 


Le 1Y 
f(¥3 80, 81,07) = (Q7)n/2 go” of 202 doi Bo ah (11) 
i=1 
It easily follows that the MLE’s for {p, 8), and o? are given by 

‘ a. a ae 

Bo = te (12) 

j, — Diets )(Ui-¥) He 
domi 1 — 3)? 

and 
(14) 


a= (Ay Aim? 


i=1 


where X¥ =n! 77, Xj. 
If we wish to test Hy: 3; = 0, we take H = (0,1), so that the model is a special case 

of the general linear hypothesis with k = 2, r= 1. Under Hop the MLE’s are 
ie: 


and 
(16) 


Thus 
nn —-¥P- De -¥ +Ae- Bim? (7) 


F= — —__ - 
ye (Yi — ¥ + Bix — B1x;)? 


= Be Lin i= %) 
ye (Yi — ¥ + Bi — Bix)? 
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From Theorem 12.2.1, the statistic [(n — 2)/1]F has a central F(1,n — 2) distribution 
under Ho. Since F(1,n—2) is the square of a t(n — 2), the likelihood ratio test rejects Ho if 


1/2 
3 (n—2) 01 (1 —X)? 
lf eo > C0, (18) 


where co is computed from f-tables for n — 2 df. 
For testing Ho: Go = 0, we choose H = (1,0) so that the model is again a special case 
of the general linear hypothesis. In this case 


e yi 
ae ae 
ini % 
and 
is 1 n a 4 
alae = bil (19) 


It follows that 


Fe Dre (Ki = Bra) = Dh (Wi Vt BF Bixi)? (20) 


and since 


A eri — Doin i —%) (Vi — Y) + xY 


ms per er, 
_ By Yo 1 —¥) + n(Bo + Bix) 
par 
* nBox 
= 0 a 
vin % 
we can write the numerator of F as 
say =) OF the 8a,) (22) 
i=1 i=1 
n B ae 2 
* a — — -_ RGoxx; 
= Y;— Bixi + PiX-Y+Y— Bix 7 
> ( visi 


~S0(¥j-¥+ Aix- Bix)? 


i=1 


2 n 
=> (?- Bix— se | +25 °(¥;— ix + Bix Y) 


i=1 i=1 
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pon ya (xj _3y 
vie? 
It follows from Theorem 12.2.1 that the statistic 
Boy/ndis @i-3P/ De? 


(23) 
VWF + Aix Bix? /(n—2) 


has a central f-distribution with n — 2 d.f. under Hy: Go = 0. The rejection region is 
therefore given by 


[Bol y(n —3)2/ DL 
VDL — Bo — Bixi)?/(a—2) 


> co, (24) 


where co is determined from the tables of t(n — 2) distribution for a given level of 
significance a. 


1 O 
For testing Ho: 89 = 2; = 0, we choose H = aa so that the model is again a 


special case of the general linear hypothesis with r = 2. In this case 


P= >B ee (25) 
and 


ei ¥?— So (Wi - V+ Ai®- Bim)? 
De (Wi— ¥ + Bi¥— Bx)? 

nY + BE (4% — 3)? 

(fi — Bo — Bix: 

_ (Bo + Bix)? oe oye (x; —X)? 
wie (%i- Bo— imi)? 


From Theorem 12.2.1, the statistic [(n —2)/2]F has a central F(2,n—2) distribution under 
Ho: Go = 8, = 0. It follows that the a-level rejection region for Hp is given by 


Fe (26) 


n—2 
2 


F> co, (27) 


where F is given by (26), and co is the upper a percent point under the F(2,n — 2) 
distribution. 
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Remark I. It is quite easy to modify the analysis above to obtain tests of null hypothe- 


ses Jo = 6, 81 = Bi, and (80,61)! = (6),6;)’, where 4,6; are given real numbers 
(Problem 4). 


Remark 2. The confidence intervals for 69, 3; are also easily obtained. One can show that 
a (1 — a)-level confidence interval for (J is given by 


p= yes a 1 GF i= nee Bo — Bixi) (28) 


pe pa Oh =x)" 


ae 1 Xj = (Yi - Bo — Bixi)? 
Bo + tn-2 an 7 (x; — 3) 


and that for 6; is given by 


(¥;— Bo — Bixi)? 
bi- th ranyf et ! bo oP ’ (29) 
ar i= 6 -£ ei 
Bi tt. wpa 5 a 


Similarly, one can obtain confidence sets for (80, 1)’ from the likelihood ratio test of 
(80, 81)! = (8G, 8)’. Itcan be shown that the collection of sets of points (8p, 31)’ satisfying 


(n— 2)[n(Bo — Bp)? + 2nk(Bo — Bo) (Bi — Bi)+ da (81 — Pi) 
27, (%:— Bo — Baxi)? 


(30) 
< F 2 n-2,0 


is a (1 — q)-level collection of confidence sets (ellipsoids) for (49, 1)’ centered at 


(Bo, 61)’. 


Remark 3. Sometimes interest lies in constructing a confidence interval on the unknown 
linear regression function E{Y | x9} = Go + 6x0 for a given value of x, or on a value 
of Y, given x = xo. We assume that xo is a value of x distinct from x) ,x2,...,X,. Clearly, 
Bo + Bixo is the maximum likelihood estimator of G9 + 6)xo. This is also the best linear 
unbiased estimator. Let us write E{Y | xo} = Bo + Byxo. Then 


E{Y | xo} = ¥— Bix+ Bix 
oie 1 —¥)(¥i-¥) 
dia i— ZX)! 
which is clearly a linear function of normal RVs Y;. It follows that E{Y | xo} is also 
normally distributed with mean E(( + 3x0) = 80 + 1x0 and variance 


var{E{Y | xo}} = E{Bo — Bo + Bixo— Bixo} (31) 


=Y } (xo Xx) 
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— var (Go) +x, var((1) + 2x9 cov(Bo, Bi) 


ofl (@=x0)? 


n ae (x; —x)* 


(See Problem 6.) It follows that 


Bo + Bix0 — Bo — Bix0 


o{(1/n) + [(®@—x0)2/ 32, 4 — x7]! 7 (32) 


is N(0, 1). But o is not known, so that we cannot use (32) to construct a confidence interval 
for E{Y | xo}. Since né*/o? is a y(n — 2) RV and nG?/o? is independent of 3 + 31x0 
(why?), it follows that 


Bo a Bito — Bo a B\xo 
Vn iG)? (33) 


has a t(n — 2) distribution. Thus, a (1 — a)-level confidence interval for Go + (1x0 is 
given by 


5 8 n [1 (x—x9)* 
(i+ Avrora] 3 E tS |. (34) 
5 8 . Hod (x—x9)* 
B+ Bote anady] 5 E set)) 


In a similar manner one can show (Problem 7) that 


A + : n [n+l (X—x0)* 
(74 Biota alas | ai + sca: (35) 


ae . n [n+l (x — x0)? 
feos a = 
Bo + Bix0 + ty-2, nay ia ‘i TSF ty ae 


is a(1 —a@)-level confidence interval for Yo = 89 + 61x09 +6, that is, for the estimated value 
Yo of Y at XQ. 


Remark 4. The simple regression model (2) considered above can be generalized in many 
directions. Thus we may consider EY as a polynomial in x of a degree higher than 1, or 
we may regard FY as a function of several variables. Some of these generalizations will 
be taken up in the problems. 


Remark 5. Let (X1,¥1),(X2,¥2),---,(Xn, Yn) be a sample from a bivariate normal popu- 
lation with parameters EX = p41, EY = pn, var(X) = 07, var(Y) = 04, and cov(X, Y) = p. 
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In Section 6.6 we computed the PDF of the sample correlation coefficient R and showed 
(Remark 6.6.4) that the statistic 


[=k — (36) 


has a t(n — 2) distribution, provided that p = 0. If we wish to test p = 0, that is, the 
independence of two jointly normal RVs, we can base a test on the statistic T. Essen- 
tially, we are testing that the population covariance is 0, which implies that the population 
regression coefficients are 0. Thus we are testing, in particular, that 3; = 0. It is there- 
fore not surprising that (36) is identical with (18). We emphasize that we derived (36) 
for a bivariate normal population, but (18) was derived by taking the X’s as fixed and 
the distribution of Y’s as normal. Note that for a bivariate normal population E{Y | x} = 
Ho + p(o2/o1)(x— 141) is linear, in consistency with our model (1) or (2). 


Example 1. Let us assume that the following data satisfy a linear regression model: 


Y; = Bo + Bixi +e. 
x | 0 1 5 3 4 5 
y | 0.475 1.007 0.838 —0.618 1.0378 0.943. 


Let us test the null hypothesis that 6; = 0. We have 


Re) 
£225, ) Gz S175, y= 0.671, 


2 
> @i—%) (1-9) = 0.9985, 
i=0 


B, = 0.0571, ) = y— Bix = 0.5279, 
5 


Soi _ Bo _ Bix)” = 2.3571, 


i=0 


and 


i )EMH=D* _ 9 3106, 
Soi —Bo-B Byx;)? 


Since t,_2,a/2 = t4,0.025 = 2-776 > 0.3106, we accept Ho at level a = 0.05. 
Let us next find a 95 percent confidence interval for E{Y | x = 7}. This is given by (34). 
We have 


_ fn [1 x0) 23571 [6/1 20.25 
iy 7 | =2.7764/ 
md nolan ee ye 6. Vale” i135 


= 2.3707, 
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Bo + Bixo = 0.5279 +0.0571 x7 
= 0.9276, 


so that the 95 percent confidence interval is (—1.4431,3.2983). 
(The data were produced from Table ST6, of random numbers with js = 0, 0 = 1, by 
letting 6) = 1 and G; = 0so that E{Y | x} = 89+ 6,x = 1, which surely lies in the interval.) 


12.3.2 Logistic and Poisson Regression 


In the regression model considered above Y is a continuous type RV. However, in a wide 
variety of problems Y is either binary or a count variable. Thus in a medical study Y 
may be the presence or absence of a disease such as diabetes. How do we modify linear 
regression model to apply in this case? The idea here is to choose a function of E(Y) so 
that in Section 12.3.1 


f(E(Y)) = 
This can be accomplished by choosing the function f to be the logarithm of the odds ratio 
f(p) =log(—*— J, 37) 
1—p/- 


where p = P(Y = 1) so that E(Y) = p. It follows that 


exp(X/3) 
1+exp(X) 


(38) 


p=E(Y)=P(¥=1) 


so that logistic regression models the logarithm of odds ratio as a linear function of RVs Xj. 
The term logistic regression derives from the fact that the function e*/(1 + e*) is known 
as the logistic function. 

For simplicity we will only consider the simple linear regression model case so that 


E(Y;) = 7;(69 + 8x;), i= 1,2,...,n, 0 < 7;(89 + Bx;) < 1. (39) 
Choosing the logistic distribution as 


exp((9 + 8x;) 
1+exp(69+ 6xi)’ 


let Y,, Y2,..., Y,, be iid binary RVs taking values 0 or 1. Then the joint PMF of Y,, Yo,..., Yn 
is given by 


1 = Ti( 80 + Bx) = (40) 


L(Bo, |x) = Thr (Imi)! "} 
=[[ 1—7;) >) Soyi toe (5 


=)} (41) 
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and the log likelihood function by 


logL(o, B|x) = nyBo+ Byam 2 log {1 + exp(So + Bxi)} (42) 


i=1 


It is easy to see that 


dlogh _ 5 sy ~So um =0. (43) 


Since the likelihood equations are nonlinear in the parameters, the MLEs of (0 and { are 
obtained numerically by using Newton—Raphson method. 

Let Bo and B be the MLE of { and £3, respectively. From section 8.7 we note that the 
variance of B is given by 


var (3 \=D8 mi(1—77), (44) 


so that the standard error (SE) of B is its square root. For large n, the so-called Wald 
statistic Z = 3/SE(3) has an approximate N (0,1) distribution under Ho : 3 = 0. Thus we 
reject Ho at level a if |z| > zq/2. One can use Ba Za/2 SE(3) as a(1 — a)-level confidence 
interval for (3. 

Yet another choice for testing Ho is to use the LRT statistics —2log\ (see Theorem 
10.2.3). Under Hp, —2log, has a chi-square distribution with 1 d.f. Here 


_ L(Bo,0|x) 
L(Bo, B|x). 


In (40) we chose the DF of a logistic RV. We could have chosen some other DF such 
as #(x), the DF of a N(0,1) RV. In that case we have Jo + 3x = (x). The resulting model 
is called probit regression. 

We finally consider the case when the RV Y is a count of rare events and has Poisson 
distribution with parameter \. Clearly, the GLM is not directly applicable. Again we only 
consider the linear regression model case. Let Y;,i= 1,2,...,k, be independent P(,;) RVs 
where A; = exp(9 + x;31), so that 


(45) 


6; = log A; = Bo + xii. 


The log likelihood function is given by 


log L(o, 81391 ++-:Yn) = >, {910;— e% —log(yi!)}. (46) 
i=l 
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In order to find the MLEs of { and 4; we need to solve the likelihood equations 


dlogL 
= = 6; —4 0) 
DB 2, {yi 9} 


dlogL < 
= y ivi —Xi0i} =0, AT 
OB) a a i ie 


which are nonlinear in (9 and 3,. The most common method of obtaining the MLEs is to 
apply the iteratively weighted least squares algorithm. 

Once the MLEs of {9 and (; are computed, one can compute the SEs of the estimates 
by using methods of Section 8.7. Using the SE({1), for example, one can test hypothesis 
concerning (3; or construct (1 — a)-level confidence interval for (). 

For a detailed discussion of Geometric and Poisson regression we refer Agresti [1]. 
A wide variety of software is available, which can be used to carry out the computations 
required. 


PROBLEMS 12.3 


. Prove statements (12), (13), and (14). 

. Prove statements (15) and (16). 

. Prove statement (19). 

. Obtain tests of null hypotheses 69 = 54,61 = 6), and (Go, 61)’ = (64, 61)’, where 
(6, 3; are given real numbers. 


why = 


5. Obtain the confidence intervals for 89 and (| as given in (28) and (29), respectively. 

6. Derive the expression for var{E{Y | xo}} as given in (31). 

7. Show that the interval given in (35) is a (1 — a)-level confidence interval for Yo = 
Bo + b1x0 +e, the estimated value of Y at xo. 

8. Suppose that the regression of Y on the (mathematical) variable x is a quadratic 


Y; = Bo + Bix + Box? +i, 


where (o, 5,, 32 are unknown parameters, x ,X2,...,X, are known values of x, and 
€1,€2,---,€n are unobservable RVs that are assumed to be independently normally 
distributed with common mean 0 and common variance o* (see Example 12.2.3). 
Assume that the coefficient vectors (xk, x4,...,x*), k =0,1,2, are linearly indepen- 
dent. Write the normal equations for estimating the (’s and derive the generalized 
likelihood ratio test of 62 = 0. 


9. Suppose that the Y’s can be written as 
Yi = Pixi + Boxi2 + axis + €i, 


where x;1, Xi2, X;3 are three mathematical variables, and ¢; are iid N(0,1) RVs. 
Assuming that the matrix X (see Example 3) is of full rank, write the normal 
equations and derive the likelihood ratio test of the null hypothesis Hp: 3; = 82 = (3. 
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10. 


11. 


12. 
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The following table gives the weight Y (grams) of a crystal suspended in a saturated 
solution against the time suspended T (days): 


TimeT | 0 1 2 3 4 5 6 
Weight Y | 0.4 O07 11 16 19 23 3% 


(a) Find the linear regression line of Y on T. 

(b) Test the hypothesis that 69 = 0 in the linear regression model Y; = 8) + 8,7; +¢;. 
(c) Obtain a 0.95 level confidence interval for {o. 

Let 0; = 7;/(1—7;) be the odds ratio corresponding to x;, i= 1,2,...,n. By consid- 
ering the ratio 0;,;/0;, how will you interpret the value of the slope parameter /3,? 


Do the same for parameter 3, in the Poisson regression model by considering the 
ratio dizi /M1 ‘ 


12.4 ONE-WAY ANALYSIS OF VARIANCE 


In this section we return to the problem of one-way analysis of variance considered in 
Examples 12.2.1 and 12.2.4. Consider the model 


Vig = bit ey, J=1,2,...,n33 i=1,2,...,k, (1) 


as described in Example 12.2.4. In matrix notation we write 


Y=X£Bte, (2) 


where 


Y= (Mii Eitan Ving) Yop. Vaden y Vangel pa ears iy.) a 
B= 


(Mis biaxera she) 4 
Ly, O .«::- 0 
X=f i ion i], 
(0) O .«::- Ln, 
© = (611, E125 ++ + Edm, y E21, E225 + + +p Eng y+ + Ek EhDy + + Ek) 


As in Example 12.2.4, Y is a vector of n-observations (n = Sy n;), whose components 
Y, are subject to random error ej ~ N(0,07), 8 is a vector of k unknown parameters, 
and X is a design matrix. We wish to find a test of Ho: fy = fo =--- = py against all 
alternatives. We may write Ho in the form HG = 0, where H is a (k — 1) x k matrix of 
rank (k — 1), which can be chosen to be 
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{ <t “G4 0 

i © =1 0 
H= 

1 OO. -2e 


Let us write joj = lg = --- ux = under Ho. The joint PDF of Y is given by 


n/2 nj 
1 
F(Y5 11, bay .- + fe, 07) = (=) exp OG As (yy — bi)” 


i=1 j=1 


and, under Ho, by 


nj 


n/2 k 
1 1 
f(xju,07) = (=) exp “7 LD Yij ~ 


It is easy to check that the MLEs are 


Ni 
i= 1 Jif 
(iy = Lisi =YJ,., i=1,2,...,k, 


nj 


k : _ 
s2 = et ae (vi =F) 


ae Sao - 


=); 


id 


and 


k ; 2 
2 ae ye (vi -y) 


n 


ol 


By Theorem 12.2.1, the likelihood ratio test is to reject Ho if 


we ey - YP - DL We -V-P (n—k 
(= 


z i 
= ja Vy — Yi)? 


where Fo is the upper a percent point in the F(k — 1,n—k) distribution. Since 


Nj 


sm Beare 255 WH Yet ¥.=7P 
i=1 j=1 ee 


Nj 


k 
yy ij — Y;.) “+ dom(¥i YY, 


i=1 j=l 
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(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


(10) 
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we may rewrite (9) as 


yy mi a 1) > Fo, (1) 


k Nj SF 
aot pee (Vij =¥p)?/(=#) 
It is usual to call the sum of squares in the numerator of (11) the between sum of squares 
(BSS) and the sum of squares in the denominator of (11) the within sum of squares (WSS). 


The results are conveniently displayed in a so-called analysis of variance table in the 
following form. 


One-Way Analysis of Variance 


Source Degrees of Mean Sum 
Variation Sum of Squares Freedom of Squares F-Ratio 
k 
BSS/(k—1) 
Bet BSS = AY.—Y)* k-1 BSS/(k—1 
etween n;( ) /( ) WSS/(n=k) 


Within wss=S°S0(¥%j-¥;.)? n—k WSS /(n—k) 


Mean nY 1 


Total Tss= 5)" y Ve n 


The third row, designated “Mean,” has been included to make the total of the second 
column add up to the total sum of squares (TSS), 37\_, ea Yq 
Example 1. The lifetimes (in hours) of samples from three different brands of batteries 
were recorded with the following results: 


Brand 
Y, Y2 Y3 
40 60 60 
30 40 50 
50 55 70 
50 65 65 
30 75 


40 


We wish to test whether the three brands have different average lifetimes. We will assume 
that the three samples come from normal populations with common (unknown) standard 
deviation o. 


ONE-WAY ANALYSIS OF VARIANCE 


From the data n) = 5, m7 = 4, n3 = 6,n 


_ 200 _ 220 
ae ee ee: aa & 


5 4 


15, and 


2); 


S“ou-y,)? =400, S01 - 94)? = 350, 


i=l i=l 


Also, the grand mean is 


200+ 220+ 360 780 


— 15 


Thus 


15 


S "(vs — 93)? = 850. 


i=l 


=52. 


BSS = 5(40 — 52)? +.4(55 — 52) + 6(60 — 52)? 


= 1140, 


WSS = 400 + 350 + 850 = 1600. 


Analysis of Variance 


Source SS d.f. MSS F-Ratio 
Between 1140 2 570 570/133.33 = 4.28 
Within 1600 12 = 133.33 
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Choosing a = 0.05, we see that Fo = F'2,12,0.05 = 3.89. Thus we reject Ho: py = 2 = [3 


at level a = 0.05. 


Example 2. Three sections of the same elementary statistics course were taught by three 
instructors. The final grades of students were recorded as follows: 


Instructor 
I II lll 
95 88 68 
33 78 719 
48 91 91 
76 51 71 
89 85 87 
82 7T7 68 


(continued) 
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Instructor 
I Il Ill 
60 31 719 
77 62 16 
96 35 
81 


Let us test the hypothesis that the average grades given by the three instructors are the 
same at level a = 0.05. 

From the data n; = 8, n2 = 10, n3 = 9, n = 27, ¥, = 70, ¥, = 74, ¥3 = 66, 
ae (yi — 91)? = 3168, per (y2;-¥)? = 3686, eo (v3; —¥3)* = 4898. Also, the grand 
mean is 
560+740+594 1894 


= 70.15. 
27 27 one 


y —. 
Thus 


BSS = 8(0.15)? + 10(3.85)* +9(4.15)? 
= 303.4075 

WSS = 3168 + 3686 + 4898 
= 11,752. 


Analysis of Variance 


Source SS df. MSS F-Ratio 


Between 303.41 2 151.70 151.70/489.67 
Within 11,752.00 24 489.67 


We therefore cannot reject the null hypothesis that the average grades given by the three 
instructors are the same. 


PROBLEMS 12.4 


1. Prove statements (5), (6), (7), and (8). 


2. The following are the coded values of the amounts of corn (in bushels per acre) 
obtained from four varieties, using unequal number of plots for the different 
varieties: 


Ax 24132 
B: 3,4,2,3,4,2 
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C: 6,4,8 
D: 7,6,7,4 


Test whether there is a significant difference between the yields of the varieties. 

3. A consumer interested in buying a new car has reduced his search to six different 
brands: D, F, G, P, V, T. He would like to buy the brand that gives the highest 
mileage per gallon of regular gasoline. One of his friends advises him that he should 
use some other method of selection, since the average mileages of the six brands are 
the same, and offers the following data in support of her assertion. 


Distance Traveled (Miles) per Gallon of Gasoline 


Brand 

Car D F G P V T 
1 42 38 28 32 30 25 
2 35 33 32 36 35 32 
3 37 28 35 27 25 24 
4 37 37 26 30 

5 28 30 

6 19 


Should the consumer accept his friend’s advice? 


4. The following data give the ages of entering freshmen in independent random 
samples from three different universities. 


University 
A B Cc 
17 16 21 
19 16 23 
20 19 22 
21 20 
18 19 


Test the hypothesis that the average ages of entering freshman at these universities 
are the same. 

5. Five cigarette manufacturers claim that their product has low tar content. Inde- 
pendent random samples of cigarettes are taken from each manufacturer and the 
following tar levels (in milligrams) are recorded. 
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Brand Tar Level (mg) 


A 4.2, 4.8, 4.6, 4.0, 4.4 
B 4.9, 4.8, 4.7, 5.0, 4.9, 5.2 
Cc 5.4, 5.3, 5.4, 5.2, 5.5 
D 5.8, 5.6, 5.5, 5.4, 5.6, 5.8 
E 5.9, 6.2, 6.2, 6.8, 6.4, 6.3 


Can the differences among the sample means be attributed to chance? 


6. The quantity of oxygen dissolved in water is used as a measure of water pollution. 
Samples are taken at four locations in a lake and the quantity of dissolved oxygen is 
recorded as follows (lower reading corresponds to greater pollution): 


Location Quantity of Dissolved Oxygen (%) 


A 7.8, 6.4, 8.2, 6.9 

B 6.7, 6.8, 7.1, 6.9, 7.3 

Cc 7.2, 7.4, 6.9, 6.4, 6.5 

D 6.0, 7.4, 6.5, 6.9, 7.2, 6.8 


Do the data indicate a significant difference in the average amount of dissolved 
oxygen for the four locations? 


12.5 TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION 
PER CELL 


In many practical problems one is interested in investigating the effects of two factors that 
influence an outcome. For example, the variety of grain and the type of fertilizer used both 
affect the yield of a plot or the score on a standard examination is influenced by the size 
of the class and the instructor. 

Let us suppose that two factors affect the outcome of an experiment. Suppose also 
that one observation is available at each of a number of levels of these two factors. Let 
Y,(i=1,2,...,a;j=1,2,...,b) be the observation when the first factor is at the ith level, 
and the second factor at the jth level. Assume that 


Yj=pt+ajt+Bjte;, i=1,2,...,a;  jf=1,2,...,b, (1) 


where a; is the effect of the ith level of the first factor, (; is the effect of the jth level of the 
second factor, and ¢€;; is the random error, which is assumed to be normally distributed with 
mean 0 and variance a. We will assume that the €jj'S are independent. It follows that Yj 
are independent normal RVs with means ji + a; + 3; and variance o*. There is no loss of 
generality in assuming that )\V_, aj = ss 6; =, for, if ij = pe’ + a4 + Bi, we can write 


wy = (uw +a +8) + (aha) +(6/-B) 
= pt+ajt 
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and 5-7, 07 = 0, yy 6; = 0. Here we have written @’ and B for the means of 
a’s and Bis, respectively. Thus Y;; may denote the yield from the use of the ith variety of 
some grain and the jth type of some fertilizer. The two hypotheses of interest are 


a,=02="-=A=0 and Bf, =f, =-:-=8,=0. 


The first of these, for example, says that the first factor has no effect on the outcome of 
the experiment. 

In view of the fact that )~“_, a; = 0 and ae Bj =0, ag =— Sa Qj, Bp =— ae 6). 
and we can write our model in matrix notation as 


Y¥ =XS+e, (2) 
where 
QS tng Y tis Paty Yo Dg ng Mal eg ae) 
B= (w, 01, @2,..-,Qa—1, 31, B2,---,Bp—1)’, 
e= (E11, £125 +++ Eb) E21) E225 +++ Eby ++ +5 Ealy Eads +++ yEab) 5 
and 
pw] a a2 Og-1| Bi Bx ++ Phi 
1 1 0 ::- 0) 1 O ::- 0 
1 1 0 0 0 1 0 
1 1 0 0 0 1 
1 1 0 0O;-1 -l -1 
1 0 1 0 1 0 
1 0 1 0 0 1 0 
X= : 
1 0 1 0 1 
1 0 1 0O;-1 -1l —1 
1;-1 —-1 -1 1 0 0) 
1;-1 —-1 -1 0) 1 0) 
1/;-1 —-1 —1 1 
1;-1 —-1 -1;-1 -1l —1 
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The vector of unknown parameters @ is (a+b —1) x 1 and the matrix X is ab x (a+ 

— 1) (& blocks of a rows each). We leave the reader to check that X is of full rank, 

a+b—1. The hypothesis H,: a} = a2 =-+:- = Qq =O or Hg: 2; = 2 =--- = By = 0 

can easily be put into the form HG = 0. For example, for Hg we can choose H to be the 
(b—1) x (a+b—1) matrix of full rank b— 1, given by 


fo] ay G2 +++ Qg-1]} By Bo ++ Bye 

0 0 O 0 1 O -: 0 

0 0 O 0; 0 1 oe 0 
H= 

Ge Or ee NO ae «A 


Clearly, the model described above is a special case of the general linear hypothesis, and 
we can use Theorem 12.2.1 to test Hg. 
To apply Theorem 12.2.1 we need the estimators /i; and /i,. It is easily checked that 


a b 
i —1 Vij 
g_ DEIDEI _ s 
ab 


and 

Gy =; —Y; Bj =Yi—Y; (4) 
where y,. = 4 yii/b, ¥.; = >; ¥i/a. Also, under Hg, for example, 

ji=y and a; =¥;.-V. (5) 


In the notation of Theorem 12.2.1, n =ab,k =a+b—1,r=b-—1, so thatn—k= 
ab—a—b+1=(a—1)(b—1), and 


b vr b a ee 
ar jan Vy Vi NS tp Fem Fee Yy 
Det Dye Ki — Yi “F477 


f= (6) 


Since 


yD ere) Sy ¥y—Y;.-Yy+¥)+(¥y-Y)}? (7) 


i=1 j=1 i=1 j=1 


a ob b 
=> > Y;. Y,; } ¥y t a) \(¥;-Y)’, 


i=1 j=l j=l 
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we may write 


=e (8) 


It follows that, under Hg, (a—1)F has a central F(b— 1, (a—1)(b—1)) distribution. 
The numerator of F in (8) measures the variability between the means Y. j, and the 
denominator measures the variability that exists once the effects due to the two factors 
have been subtracted. 
If Hq is the null hypothesis to be tested, one can show that under H, the MLEs are 


p=y and Bj =y5-¥. (9) 
As before, n = ab,k =a+b—1, but r=a-—1. Also, 


b = b a ae 
ee ee eee 


v= a b Vv Vv Vy\2 
wy (Wy ¥;--¥ 5+ ¥) 


; (10) 


which may be rewritten as 


_ b yin (Vi. Y)? 
= ; =a 
Lei patty FY gt Y/Y 


(11) 


It follows that, under Hy, (b—1)F has a central F(a — 1, (a—1)(b—1)) distribution. The 
numerator of F in (11) measures the variability between the means Y,.. 
If the data are put into the following form: 


B Levels of Factor 2 

‘ie 1 2 b | Row Means 

Yu, Yio, 7+, Yio Fi. 

Levels 2 Yor, Yoo, +++, Yoo Y2. 

of 
Factor 1 
a Vos Yo, oy) Yee Vas 
Column Means | Y.;, Yo, -::, Yep Y 


so that the rows represent various levels of factor 1, and the columns, the levels of factor 2, 
one can write 
a 
between sum of squares for rows = by \(¥. =P 
i=1 
= sum of squares for factor | 
= SS). 
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Similarly, 
b 
between sum of squares for columns = a Res j-Yy 
j=l 
= sum of squares for factor 2 
= SS. 


It is usual to write error or residual sum of squares (SSE) for the denominator of (8) or (11). 
These results are conveniently presented in an analysis of variance table as follows. 


Two-Way Analysis of Variance Table with One Observation per Cell 


Source of Sum of Degrees of Mean 

Variation Squares Freedom Square F-Ratio 
Rows SS; a—1 MS, = SS;/(a—1) MS, /MSE 
Columns SS» b-1 MS> = SS2/(b— 1) MS, /MSE 
Error SSE (a—1)(b—1) MSE=SSE/(a—1)(b—1) 

Mean — abY” 1 abY 


a b 


ab 
Total SoS CY; ab S 7S Yj /ab 


i=1 j=l i=1 j=l 


Example 1. The following table gives the yield (pounds per plot) of three varieties of 
wheat, obtained with four different kinds of fertilizers. 


Variety of Wheat 


Fertilizer A B G 


8 
10 


MR WA 


Rn BW 
ND ON 


Let us test the hypothesis of equality in the average yields of the three varieties of wheat 
and the null hypothesis that the four fertilizers are equally effective. 

In our notation, b =3,a=4,y,.=6, y. = 7.33, v3. = 5.67, y4. = 6.33, y.; = 8, .2 =4, 
¥.3=7, y= 6.33. 

Also, 


SS; = sum of squares due to fertilizer 
= 3[(0.33)? + 17 + (0.66)? +07] 
= 4.67; 
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SS. = sum of squares due to variety of wheat 
= 4[(1.67)? + (2.33)? + (0.67)?] 
= 34.67 


4 3 
SSE=S°S 0 (yy —¥;. Fi +3) 
i=1 j=1 
= 7.33 


The results are shown in the following table: 


Analysis of Variance 


Source SS df. MS F-Ratio 
Variety of wheat 34.67 2 17,33 14.2 
Fertilizer 4.67 3 1.56 1.28 
Error 7.33 6 1.22 

Mean 481.33 1 481.33 

Total 528.00 12 44.00 


Now F2 60.05 = 5.14 and F3 6.0.05 = 4.76. Since 14.2 > 5.14, we reject Hg, that there is 
equality in the average yield of the three varieties; but, since 1.28 4 4.76, we accept Ha, 
that the four fertilizers are equally effective. 


PROBLEMS 12.5 


1. Show that the matrix X for the model defined in (2) is of full rank, a+ b—1. 
2. Prove statements (3), (4), (5), and (9). 


3. The following data represent the units of production per day turned out by four 
different brands of machines used by four machinists: 


Machinist 


Machine A, A> A3 Ag 


B, 15 14 19 18 
By 17. 12 20 16 
B3 16 18 16 17 
Ba 16 16 15 15 


Test whether the differences in the performances of the machinists are significant 
and also whether the differences in the performances of the four brands of machines 
are significant. Use a = 0.05. 
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4. Students were classified into four ability groups, and three different teaching 
methods were employed. The following table gives the mean for four groups: 


Teaching Method 


AbilityGroup A 8B Cc 


1 15 19 14 
2 18 17 12 
3 22 25 17 
4 17 21 19 


Test the hypothesis that the teaching methods yield the same results. That is, that the 
teaching methods are equally effective. 

5. The following table shows the yield (pounds per plot) of four varieties of wheat, 
obtained with three different kinds of fertilizers. 


Variety of Wheat 


Fertilizer A B C D 


a 8 3 6 7 
B 10 4 5 8 
7 8 4 6 7 


Test the hypotheses that the four varieties of wheat yield the same average yield and 
that the three fertilizers are equally effective. 


12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 


The model described in Section 12.5 assumes that the two factors act independently, that 
is, are additive. In practice this is an assumption that needs testing. In this section we allow 
for the possibility that the two factors might jointly affect the outcome, that is, there might 
be so-called interactions. More precisely, if Y;; is the observation in the (i,/)th cell, we 
will consider the model 


Yj =pt+at B+ ywrey, () 


where a;(i= 1,2,...,a) represent row effects (or effects due to factor 1), 6;(j = 1,2,...,b) 
represent column effects (or effects due to factor 2), and jj; represent interactions or joint 
effects. We will assume that ¢; are independently N(0, o”). We will further assume that 


a b 


b a 
Siaj=0=S°8 and SY y=0 for alli, So y;=0 (2) 
j=l 


i=1 j=l i=1 
for all j. 
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The hypothesis of interest is 
Ho: yj = 0 for all i,7. (3) 


One may also be interested in testing that all a’s are 0 or that all 6’s are 0 in the presence 
of interactions 7¥j. 
We first note that (2) is not restrictive since we can write 


Vg =u bay +B +154 ey, 


where a‘, Bi, and Vii do not satisfy (2), as 


Yy=w +0 +B +74 (al-a +9}.-7)+(G-F +95-7) 
VG =F PY ey, 
and then (2) is satisfied by choosing 
paw +a+B +7, 
a= a;—-A +7. — 7, 
—/ = = 
Bj =B-B +75-7, 
= Tee 


Here 
a b b 
=) 

a=a'S a, Bor") 2, War dy, 
i=1 j=l j=l 
a a b 

om =q! art and y= (ab)~! ye 
i=1 


i=1 j=l 


Next note that, unless we replicate, that is, take more than one observation per cell, 
there are no degrees of freedom left to estimate the error SS (see Remark 1). 

Let Yj, be the sth observation when the first factor is at the ith level, and the second 
factor at the jth level, i= 1,2,...,a,7=1,2,...,b, s=1,2,...,m(> 1). Then the model 
becomes as follows: 


Levels of Factor 2 


Levels of Factor 1 1 2 oo b 
1 Yur iat Yt 
Yilm Y12m —* Yibm 


(continued) 
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Levels of Factor 2 


Levels of Factor 1 1 2 oo b 


2 you Yoon tt Yb 
Y21m Y22m a Y2bm 
a Yall Ya21 eee Yab1 
Yalm Ya2m pina Yabm 

Yijg = + 04 + Bi + Vy + Eis, (4) 


i=1,2,...,a,j=1,2,...,b, and s = 1,2,...,m, where e,,’s are independent N(0, 07). 
We assume that )-"_, aj = yi f=. Ke ~ i; = 0. Suppose that we wish to 


test Hy: @) = Q2 =-+: = Aq = 0. We leave the reader to check that model (4) is then a 
special case of the general linear hypothesis with n = abm, k = ab,r=a—1,andn—k= 
ab(m— 1). 


Let us write 


Y= 


Joist Lye jt Vis 5 Doyen Yiis 


a >e oo i 7, 2 Lis Poe Yiis (5) 


mb am 


Then it can be easily checked that 


f=h=¥, 4&=¥,.-¥, §=6=Y,-Y, o 
Vg = y = Vy — Yi Yap +Y. 
It follows from Theorem 12.2.1 that 
ee pO EO Cea si et 6 ae = Yq)? (7) 


yi day Los Vij — Vy? 
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Since 
So 35> ois — Ye. + ¥i..- YP 
=S EL -FP+ LE. 


we can write (7) as 


Under H,, the statistic [ab(m— 1)/(a—1)]F has the central F(a — 1,ab(m— 1)) distribu- 
tion, so that the likelihood ratio test rejects H,, if 


ab(m—1) mb °, ae =u a (0) 
aL Side das(Vis — Yy-)? 
A similar analysis holds for testing Hg: 5; = 82 =--- = Jp. 


Next consider the test of hypothesis H, : + = 0 for all i,j, that is, that the two factors are 
independent and the effects are additive. In this case n = abm, k = ab, r = (a—1)(b-—1), 
and n— k = ab(m— 1). It can be shown that 


A=Y, 4,=Y,.-Y, andf=Y,.-Y. (10) 
Thus 
pa LLP Vy AIP EDT Te gy 
SEEK) 
Now 


i 7 Ss 

See le Yj. t Yj. Yj. ¥j.+Y) 
i Jj Ss 

=>) Ga-H +>) er - Ppty, 
FS erg 


so that we may write 


Nhe Fe Fy + FP 
oat on 
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Under H.,, the statistic {(m— 1)ab/|(a—1)(b—1)]}F has the F((a—1)(b—1), ab(m—1)) 
distribution. The likelihood ratio test rejects H, if 


(m—1)ab i ae ie tee 


~ ~ ~ > c. (13) 
(a—I)(b-1) YT. ais — Fy)? 
Let us write 
SS; = sum of squares due to factor 1 (row sum of squares) 
= bm) “(¥;..—Y)’, 
i=l 
SS2 = sum of squares due to factor 2 (column sum of squares) 
b 
=am) “(Yj.—Y)’, 
j=l 
SSI = sum of squares due to interaction 
a b 
=m) \S0(¥i.-Y¥i.. Yj. +Y), 
i=1 j=l 
and 
SSE = sum of squares due to error (residual sum of squares) 
a bom 
“PES te BP. 
i=1 j=1 s=l 
Then we may summarize the above results in the following table. 
Two-Way Analysis of Variance Table with Interaction 
Source of Sum of Degrees of Mean 
Variation Squares Freedom Square F-Ratio 
Rows SS; a—1 MS, = SS,/(a—1) MS, /MSE 
Columns SS2 b-1 MS2 = SS2/(b—1) MS2/MSE 
Interaction SSI (a—1)(b—1) MSI=SSI/(a—1)(b—1) MSI/MSE 
Error SSE ab(m— 1) MSE = SSE/ab(m— 1) 
Mean abmX 1 abmX 
a bom a bom 
Total Ss ~ > Yi, abm S- » y,, Y;,/abm 


i=1 j=l s=1 


i=1 j=l s=1 
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Remark 1. Note that, if m= 1, there are no d.f. associated with the SSE. Indeed, SSE = 0 
if m = 1. Hence, we cannot make tests of hypotheses when m = 1, and for this reason we 
assume m > 1. 


Example 1. To test the effectiveness of three different teaching methods, three instructors 
were randomly assigned 12 students each. The students were then randomly assigned to 
the different teaching methods and were taught exactly the same material. At the con- 
clusion of the experiment, identical examinations were given to the students with the 
following results in regard to grades. 


Instructor 


Teaching 
Method I HU Il 


1 95 60 86 
85 90 77 
74 80 75 
74 70 70 
2 90 89 83 
80 90 70 
92 91 75 
82 86 72 
3 70 68 74 
80 73 86 
85 78 91 
85 93 89 


From the data the table of means is as follows: 


82 75 77 ~~ 78.0 
86 89 75 83.3 
80 78 85 81.0 


yj. 82.7 80.7 79.0 ¥=80.8 
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Then 


SS; = sum of squares due to methods 
= bm “(y;..—y) 
i=l 


=3x4x 14.13 = 169.56, 


SS». = sum of squares due to instructors 


b 
= am) (¥.j- =yy" 
j=l 


=3x4x 6.86 = 82.32, 


SSI = sum of squares due to interaction 


E| 3 
= mS~S°(5;.- = Vie Vij? +y/ 


i=1 j=l 
=4~x 140.45 = 561.80, 


SSE = residual sum of squares 


3 3 4 
= S735 Gis —Fy-)? = 1830.00. 


i=1 j=l s=1 


Analysis of Variance 


Source SS d.f. MSS F-Ratio 
Methods 169.56 2, 84.78 1.25 
Instructors 82.32 2 41.16 0.61 
Interactions 561.80 4 140.45 2.07 
Error 1830.00 27 67.78 


With a = 0.05, we see from the tables that F2,27,0.05 = 3.35 and F'4,27,0.05 = 2.73, so that 
we cannot reject any of the three hypotheses that the three methods are equally effective, 
that the three instructors are equally effective, and that the interactions are all 0. 


PROBLEMS 12.6 


1. Prove statement (6). 
2. Obtain the likelihood ratio test of the null hypothesis Hg: 8; = 6. =--- = By =0. 
3. Prove statement (10). 
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4. Suppose that the following data represent the units of production turned out each day 
by three different machinists, each working on the same machine for three different 


days: 
Machinist 
Machine A B Cc 
B, 15,15,17 19,19,16 16, 18,21 
Bo 17,17,17  15,15,15 19, 22,22 
B; 15,17,16 18,17,16 18, 18,18 
Ba, 18,20,22 15,16,17 17,17,17 


Using a 0.05 level of significance, test whether (a) the differences among the machin- 
ists are significant, (b) the differences among the machines are significant, and (c) 


the interactions are significant. 


5. In an experiment to determine whether four different makes of automobiles average 
the same gasoline mileage, a random sample of two cars of each make was taken 
from each of four cities. Each car was then test run on 5 gallons of gasoline of the 
same brand. The following table gives the number of miles traveled. 


Automobile Make 
Cities A B Cc D 
Cleveland 92.3,104.1 90.4,103.8 110.2,115.0 120.0, 125.4 
Detroit 96.2,98.6 91.8,100.4 112.3,111.7 124.1, 121.1 
San Francisco 90.8, 96.2 90.3, 89.1 107.2, 103.8 118.4, 115.6 
Denver 98.5, 97.3 96.8,98.8  115.2,110.2 126.2, 120.4 


Construct the analysis of variance table. Test the hypothesis of no automobile effect, 
no city effect, and no interactions. Use a = 0.05. 
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NONPARAMETRIC STATISTICAL 
INFERENCE 


13.1 INTRODUCTION 


In all the problems of statistical inference considered so far, we assumed that the distribu- 
tion of the random variable being sampled is known except, perhaps, for some parameters. 
In practice, however, the functional form of the distribution is seldom, if ever, known. 
It is therefore desirable to devise methods that are free of this assumption concerning 
distribution. In this chapter we study some procedures that are commonly referred to as 
distribution-free or nonparametric methods. The term “distribution-free” refers to the fact 
that no assumptions are made about the underlying distribution except that the distribution 
function being sampled is absolutely continuous. The term “nonparametric” refers to the 
fact that there are no parameters involved in the traditional sense of the term “parameter” 
used thus far. To be sure, there is a parameter which indexes the family of absolutely con- 
tinuous DFs, but it is not numerical and hence the parameter set cannot be represented as a 
subset of &,,, for any n > 1. The restriction to absolutely continuous distribution functions 
is a simplifying assumption that allows us to use the probability integral transformation 
(Theorem 5.3.1) and the fact that ties occur with probability 0. 

Section 13.2 is devoted to the problem of unbiased (nonparametric) estimation. We 
develop the theory of U-statistics since many estimators and test statistics may be viewed 
as U-statistics. Sections 13.3 through 13.5 deal with some common hypotheses testing 
problems. In Section 13.6 we investigate applications of order statistics in nonparamet- 
ric methods. Section 13.7 considers underlying assumptions in some common parametric 
problems and the effect of relaxing these assumptions. 
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13.2, U-STATISTICS 


In Chapter 6 we encountered several nonparametric estimators. For example, the empir- 
ical DF defined in Section 6.3 as an estimator of the population DF is distribution-free, 
and so also are the sample moments as estimators of the population moments. These are 
examples of what are known as U-statistics which lead to unbiased estimators of popula- 
tion characteristics. In this section we study the general theory of U-statistics. Although 
the thrust of this investigation is unbiased estimation, many of the U-statistics defined in 
this section may be used as test statistics. 

Let X|,X2,...,X, be iid RVs with common law £(X), and let P be the class of all pos- 
sible distributions of X that consists of the absolutely continuous or discrete distributions, 
or subclasses of these. 


Definition 1. A statistic T(X) is sufficient for the family of distributions P if the 
conditional distribution of X, given T = ft, is the same whatever the true F € P. 


Example 1. Let X,,X2,...,X, be arandom sample from an absolutely continuous DF, and 
let T = (X(1),.--,X(n)) be the order statistic. Then 


f(x|T=t)=(@))", 
and we see that T is sufficient for the family of absolutely continuous distributions on R&. 


Definition 2. A family of distributions P is complete if the only unbiased estimator of 0 
is the zero function itself, that is, 


Erh(X) =0 for all FE P > h(x) =0 
for all x (except for a null set with respect to each F € P). 


Definition 3. A statistic T(X) is said to be complete in relation to a class of distributions 
P if the class of induced distributions of T is complete. 


We have already encountered many examples of complete statistics or complete 
families of distributions in Chapter 8. 


The following result is stated without proof. For the proof we refer to Fraser [32, 
pp. 27-30, 139-142]. 


Theorem 1. The order statistic (X(1) X(2)5++- X(n)) is a complete sufficient statistic 
provided that the iid RVs X,,X2,...,X,, are of either the discrete or the continuous type. 


Definition 4. A real-valued parameter g(F) is said to be estimable if it has an unbiased 
estimator, that is, if there exists a statistic T(X) such that 


ErT(X) = g(F) for all F € P. (1) 
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Example 2. If P is the class of all distributions for which the second moment exists, X is 
an unbiased estimator of j1(F), the population mean. Similarly, ji2(F) = varr(X) is also 
estimable, and an unbiased estimator is S* = )~)(X;—X)?/(n—1). We would like to know 
whether X and S? are UMVUEs. Similarly, F(x) and Pr(X; +X > 0) are estimable for 
Fe ?, 


Definition 5. The degree m(m > 1) of anestimable parameter g(F) is the smallest sample 
size for which the parameter is estimable, that is, it is the smallest m such that there exists 
an unbiased estimator T(X1,X2,...,Xm) with 


ErT = g(F) for all F € P. 


Example 3. The parameter g(F) = Pr{X > c}, where c is a known constant, has degree 1. 
Also, 1(F) is estimable with degree 1 (we assume that there is at least one F € P such that 
u(F) £0), and po(F) is estimable with degree m = 2, since 2(F’) cannot be estimated 
(unbiasedly) by one observation only. At least two observations are needed. Similarly, 
yu? (F) has degree 2, and P(X; + X2 > 0) also is of degree 2. 


Definition 6. An unbiased estimator of a parameter based on the smallest sample size 
(equal to degree m) is called a kernel. 


Example 4. Clearly X; 1 <i<n is a kernel of yu(F); T(X;) = 1, if X; > c, and = 0 if 
X; <c is a kernel of P(X > c). Similarly, T(X;,X;) = 1 if X; +X; > 0, and =0 otherwise 
is a kernel of P(X; +X; > 0), X;X; is a kernel of u?(F) and X? — X;X; is a kernel of p1(F). 
Lemma 1. There exists a symmetric kernel for every estimable parameter. 


Proof. Vf T(X,X2,...,Xm) is a kernel of g(F), so also is 


a0. Sep. Carre Xm) = i Te 1X Mind) (2) 


where the summation P is over all m! permutations of {1,2,...,m}. 


Example 5. A symmetric kernel for ji2(F) is 
Ts(X;,Xj) = 3{T(Xi,X)) + T(X;,Xi)} 
=3(Xi-Xj)*, if =1,2,....0 (GAZ). 


Definition 7. Let g(F) be an estimable parameter of a m, and let X;,X2,...,X, bea 
sample of size n, n > m. Corresponding to any kernel 7(X;,,...,X;,,) of g(F), we define a 
U-statistic for the sample by 


U(X1,X2,.. om=(") 2 adiewg his) (3) 


where the summation C is over all (”) combinations of m integers (i), i2,...,im) chosen 
from {1,2,...,}, and T, is the symmetric kernel defined in (2). 
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Clearly, the U-statistic defined in (3) is symmetric in the X;’s, and 
ErU(X) = g(F) for all F. (4) 


Moreover, U(X) is a function of the complete sufficient statistic X(1);X(2)5-+-X(n)- It 
follows from Theorem 8.4.6 that it is UMVUE of its expected value. 


Example 6. For estimating ju(F), the U-statistic is n~!5~}X;. For estimating j12(F), 
a symmetric kernel is 


Ts(X;,,X_)=4(X,—X,Y, 1 =1,2,...,n (is Fi), 


so that the corresponding U-statistic is 


= §?. 


Similarly, for estimating y?(F), a symmetric kernel is T,(X;,,X;,) = X;,Xi,, and the 
corresponding U-statistic is 


= a} 1 SIXiX = aay DXA 


i<j 


For estimating p3(F), a symmetric kernel is T,(X;,,Xi,,Xi,) = Xi,Xi,Xi, so that the 
corresponding U-statistic is 


For estimating F(x) a symmetric kernel is /{x,<,j so the corresponding U-statistic is 


1 n 
=~ CIx<q = Fa 
- i=1 


and for estimating P(X > 0) the U-statistic is 


I n . 
X) = ~ x50) = beak, (0). 
i=1 
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Finally, for estimating P(X, + X2 > 0) the U-statistic is 


1 
U(X) = Cy Dale 
2 


i<j 


Theorem 2. The variance of the U-statistic defined in (3) is given by 


muy = al s (") ("-") be, (5) 


m) c=1 
where 
Ge = cove {Ts (Xi... Xin) » Ts (Xj +++ Xin) F 
with m, the degree of g(F’), and c is the common number of integers in the sets {i),..., im} 


and {j\,...,jm}. (For c = 0, the two statistics T(X; 
independent and have zero covariance.) 


--Xi,,) and T(X;j,,...,%j,) are 


pore 


Proof. Clearly 


var U(X) 
1 
= folk So Er {Ts (Xi,--- Xin) — SOF) LT (Xj, yer Xj,,) —g(F)}). 
Let c be the number of common integers in {i1,i2,...,in} and {j2,j2,...,jm}. Then c takes 


values 0,1,...,m and for c = 0, T,(X;,,...,Xj,,) and T,(Xj,,...,X;,,) are independent. It 


follows that 
1 “fn m\ (n—m 
wou = ad Cn) e) oe) . 


which is (5). The counting argument from (6) to (7) is as follows: First we select integers 
{i,,...,im} from {1,2,...,} in ) ways. Next we select the integers in {j1,...,jm}. This 
is done by selecting first the c integers that will be in {i),...,im} (hence common to both 
sets) and then the m—c integers from n — m integers which will not be {j1,...,jm}. Note 
that ¢) = 0 from independence. 


Example 7. Consider the U-statistic estimator X of g(F) = u(F) in Example 6. Here 
m= 1, T(x) =x, and ¢ = var(X;) =o” so that var(X) = 0? /n. 

For the parameter g(F) = po(F), U(X) = S. In this case, m = 2, T;(Xj,,Xi,) = 
(Xj, —X,,)?/2 so 


var U(X) = GIN +O}, 
() 
where 
4 
Qo = Er 17 (Xi, -x,)'-0'} = an 
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and 


where i 4 j2. Then 


and 


2 [(m—2)(4—04) , pao 
| fs S - 


which agrees with Corollary 2 to Theorem 6.3.4. 


For the parameter g(F) = F(x), varU(X) = F(x)(1 — F(x))/n, and for g(F) = 
Pr(X +X > 0) 


var U(X) = 


1 


where 
¢, = Pr(X1 +X > 0, X, +X; > 0) — P2.(X, +X, > 0) 
and 


y = Pr(X, + Xz > 0) — P2(X, + Xp > 0) 
= Pr(X1 + X2 > 0)Pr(X1 + X2 < 0). 


Corollary to Theorem 2. Let U be the U-statistic for a symmetric kernel 
T,(X1,X2,...,Xm). Suppose Er|T,(X1,...,Xm)]° < oo. Then 


lim {nvar U(X)} =m. (7) 


Proof. It is easily shown that 0 < ¢. < ¢, for 1 <c < m. It follows from the hypothesis 
Gm = var[T,(X1,...,Xm)|° < 00 and (5) that var U(X) < 00. Now 


OCD, (nt’n nm? 

(3) elm =e)? n(n = 2m + 0)! 
_ nl) nmin m=1)--(n2m be), 
c![(m—c)!]? n(n—1)---(n—m+1) c 
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Now note that the numerator has m—c-+ | factors involving n, while the denominator has 
m such factors so that for c > 1, the ratio involving n goes to 0 as n > oo. For c = I, this 
ratio +1 and 


as n + 00. 
Example 8. In Example 7, nvar(X) = 0? and 

nvar(S") 3 2 q = [4-0 
as n —> 00. 


Finally we state, without proof, the following result due to Hoeffding [45], which estab- 
lishes the asymptotic normality of a suitably centered and normed U-statistic. For proof 
we refer to Lehmann [61, pp. 364-365] or Randles and Wolfe [85, p. 82]. 


Theorem 3. Let X|,X2,...,X, be a random sample from a DF F and let g(F) be an 
estimable parameter of degree m with symmetric kernel T,(X,,X2,...,Xm)- 

If Ep {T,(X1,Xo,... eA < co and U is the U-statistic for g (as defined in (3)), then 
Jn(U(X) — g(F)) pas N(0,m?¢,), provided 


a= = cove {Ts (X, iyo vad) oet s (Riis sail h >0. 


In view of the corollary to Theorem 2, it follows that (U— g(F))/ ))//var(U) var ( 3 N(0, 1), 
provided ¢, > 0. 


Example 9 (Example 7 continued). Clearly, \/n(X — 1)/ o—+N(0, 1) as n + ©© since 
¢;=0*>0. 
1 —3 
For the parameter g(F) = po(F), varU(X) = var(S?) = — {us = ott, 
n n— 
C1 = (444 — o*)/4 > 0 so it follows from Theorem 3 that 


Vn(S? — 0?) +N(0, u4 — 0%). 
The concept of U-statistics can be extended to multiple random samples. We will 
restrict ourselves to the case of two samples. Let X1,X2,...,Xn, and Y1, Y2,..., Yn, be two 


independent random samples from DFs F and G, respectively. 


Definition 8. A parameter g(F,G) is estimable of degrees (m,,mz) if m, and my are the 
smallest sample sizes for which there exists a statistic T (X1,...,Xm,3Y1,---; Ym, ) Such that 


Ep oT (X1,---,Xm3V1,-+-;¥m) = 8(F,G) (8) 


for all F,G € P. 
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The statistic T in Definition 8 is called a kernel of g and a symmetrized version of T, 
T, is called a symmetric kernel of g. Without loss of generality therefore we assume that 
the two-sample kernel T in (9) is a symmetric kernel. 


Definition 9. Let g(F,G), F,G € P be an estimable parameter of degree (m,,m). Then, 
a (two-sample) U-statistic estimate of g is defined by 


uoxy)=(") ee IG eis CO 


icA jeB 


where A and B are collections of all subsets of m, and m2 integers chosen without 
replacement from the sets {1,2,...,m1} and {1,2,...,n2} respectively. 


Example 10. Let X,X2,...,Xn, and Y,,¥o,...,Y,, be two a Fee pe 

from DFs F and G, respectively. Let g(F,G) = P(X < Y) i. F(x)g(x)dx = 
cee ~~ P(Y > y)f(y)dy, where f and g are the respective PDFs of F and G. oe 

1, ifX,<Y, 

T(X;; y) = ? J 

0, ifX;>Y, 


is an unbiased estimator of g. Clearly, g has degree (1,1) and the two-sample U-statistic 
is given by 


Theorem 4. The variance of the two-sample U-statistic defined in (10) is given by 


a My My, mp7 Ny — Mp) 
MEE = Be) m Ee” Cy) (a Ga 0 


m/ \m2) c=0 d=0 
where ¢..q is the covariance between T (X;,,..-,Xi,,3¥jis-++sYjn,) and T (Xk 5---+Xky, 3 
Y¢,,-..,Y¢,,,) with exactly c X’s and d Y’s in common. 


Corollary. Suppose Er ¢gT?(X1,.--,Xm3¥i,---; Yn) < 00 for all F,G € P. Let N =n + 
ny and suppose 11,N2,N —> oo such that n/N + A, n/N — 1— Xd. Then 


m me 
sim N var U(X; Y) = eu Toxo: (11) 


The proofs of Theorem 4 and its corollary parallel those of Theorem 2 and its corollary 
and are left to the reader. 


Example 11. For the U-statistic in Example 10 


Ep GU’ (X;Y) = ne Fag oD Fr (MAY) T(Xx3¥e)}. 
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Now 


Erc{T(X, i} ¥)T (Xx; Ye) } = P(X; < Yj, Xx < Y;) 


J F@)g(x)dx fori=k, j= 4, 
lls eas fori=k, j AE, 
ee x)dx fori Ak, j=2, 
Fis Rije@a| for PR SD 


where f and g are PDFs of F and G, respectively. Moreover, 


ao= f ” 1 GQ)PF@)ae— [2 OP 


—oo 


and 


It follows that 


var U(X; Y) = a= lalF. G)[1— 9(F,G)| + (m —1)G10+(m —1)Co1}. 


In the special case when F = G, g(F,G) = 1/2, G10 = G1 = 1/3 — 1/4 = 1/12, and 
var U = (ny +2 +1) /[12nn9]. 


Finally we state, without proof, the two-sample analog of Theorem 3 which establishes 
the asymptotic normality of the two-sample U-statistic defined in (10). 


Theorem 5. Let X),X2,...,X,, and Y;, Y2,..., Y,, be independent random samples from 
DFs F and G, respectively, and let g(F',G) be an estimable parameter of degree (mm ,mz). 
Let T(X1,...,Xin,3 Y1,---;¥m,) be a symmetric kernel for g such that ET* < co. Then 


Vm +z {U(X: Y) — g(F,G)} + N(0,07), 


mio mo, 
where o? = 5 4 i . , provided a? > 0 and 0 < A =limy-..(m/N) =A <1, 
N=n,+n. 


We see that (U — g)/\/var U—+N(0, 1), provided o? > 0. 


For the proof of Theorem 5 we refer to Lehmann [61, p. 364], or Randles and Wolfe [85, 
p. 92]. 


Example 11 (Continued). In Example 11 we saw that in the special case when F = G, 
C10 = Co1 = 1/12, and var U = (ny +2 + 1)/[12nj ng]. It follows that 
U(%;Y) — (1/2) 


JV (nm +19 +1)/[12n np] »N(0,1). 
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PROBLEMS 13.2 


1. Let (R, 8, Po) be a probability space, and let P = {Py: 0 € O}. Let A be a Borel 
subset of ®, and consider the parameter d(0) = P(A). Is d estimable? If so, what is 
the degree? Find the UMVUE for d, based on a sample of size n, assuming that P is 
the class of all continuous distributions. 

2. Let X1,X2,...,X, and Y,,Y¥2,...,Y, be independent random samples from two 
absolutely continuous DFs. Find the UMVUEs of (a) E{XY} and (b) var(X + Y). 

3. Let (Xi, 1), (X2, ¥2),.-.,(Xn,Y,) be a random sample from an absolutely continu- 
ous distribution. Find the UMVUEs of (a) E(XY) and (b) var(X + Y). 

4. Let T(X1,X2,...,X,) be a statistic that is symmetric in the observations. Show that 
T can be written as a function of the order statistic. Conversely, if T(X1,X2,...,Xn) 
can be written as a function of the order statistic, T is symmetric in the observations. 

5. Let X,,X2,...,X, be arandom sample from an absolutely continuous DF F, F € P. 
Find U-statistics for g(F) = w?(F) and go(F) = p3(F). Find the corresponding 
expressions for the variance of the U-statistic in each case. 

6. In Example 3, show that j12(F’) is not estimable with one observation. That is, show 
that the degree of j12(F’) where F € ?, the class of all distributions with finite second 
moment, is 2. 

7. Show that forc = 1,2,...,m, O<¢. < Gn. 


8. Let X,,X2,...,X, be arandom sample from an absolutely continuous DF F, F € P. 
Let 


@(F) = Ep|Xy —X)|. 


Find the U-statistic estimator of g(/’) and its variance. 


13.3. SOME SINGLE-SAMPLE PROBLEMS 


Let X,,X2,...,X, be arandom sample from a DF F. In Section 13.2 we studied properties 
of U-statistics as nonparametric estimators of parameters g(F’). In this section we con- 
sider some nonparametric tests of hypotheses. Often the test statistic may be viewed as a 
function of a U-statistic. 


13.3.1 Goodness-of-Fit Problem 


The problem of fit is to test the hypothesis that the sample comes from a specified DF 
Fo against the alternative that it is from some other DF F, where F(x) 4 Fo(x) for some 
x € ®. In Section 10.3 we studied the chi-square test of goodness of fit for testing Ho : 
X; ~ Fo. Here we consider the Kolmogorov—Smirnov test of Ho. Since Ho concerns the 
underlying DF of the X’s, it is natural to compare the U-statistic estimator of g(F) = 
F(x) with the specified DF Fy under Ho. The U-statistic for g(F’) = F(x) is the empirical 
DF F* (x). 
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Definition 1. Let X,,X2,...,X,, be a sample from a DF F, and let F* be a corresponding 
empirical DF. The statistic 


D, = sup|F3 (x) — Fo) () 


is called the (two-sided) Kolmogorov—Smirnov statistic. We write 


Dy = sup[Fy (x) — F()| (2) 
and 

Dy, = sup[F(x) — Fr ()] (3) 
and call D1, D> the one-sided Kolmogorov—Smirnov statistic. 


Theorem 1. The statistics D,, D> Dy are distribution-free for any continuous DF F. 


n? 


no? 


of X,,X2,-..,Xn, and define X(9) = —0o, X(n41) = +00. Then 


Proof. Clearly, D, = max(D;,D7 ). Let X(1) < X(2) < +++ < X(n) be the order statistics 


i 
F* = — f X(j S < Xi; ; b= OV 2 ied « ; 
n(x) == or X(y SX< Xi, I=, n 
and we have 


m i 
D; = max — sup — — F(x) 
O0<i<ny 1 <X<X n 
SISPX() S1< X41) 


i é 
= max {: — inf Fc) 
O0<i<n | Nn X(i) <x<X(i41) 
i 
= guax { -F(Xy) 
i 
= max {imax E - F(X) o} : 


Since F(X, ) is the ith-order statistic of a sample from U(0, 1) irrespective of what F is, as 
long as it is continuous, we see that the distribution of D7 is independent of F. Similarly, 


j—1 
Dy =max {max [FxW)-5 jo}, 
n 


l<i<n 


and the result follows. 


Without loss of generality, therefore, we assume that F is the DF of a U(0,1) RV. 
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Theorem 2. If F is continuous, then 


laa [oe 
{ rT (1/2n)—v_ J (3/2n)—v 
P) Dan S<vt+ x} = v+[(2n—1) /2n] 1 (4) 
1,U2,-+-,Un): | du; ifO<v< ; 
[(2n—1) /2n]—v II 2n 


ifv <0, 


2n—1 
1 ifv> 
ifv> mm 


where 


nl, O<uy<-:-<m<l 


Fontnte)={ , (5) 


0, otherwise, 
is the joint PDF of the set of order statistics for a sample of size n from U(0, 1). 


We will not prove this result here. Let D,, ., be the upper a-percent point of the distribu- 
tion of D,,, that is, P{D,, > Dna} < a. The exact distribution of D,, for selected values of n 
and a has been tabulated by Miller [74], Owen [79], and Birnbaum [9]. The large-sample 
distribution of D,, was derived by Kolmogorov [53], and we state it without proof. 


Theorem 3. Let F be any continuous DF. Then for every z > 0 


lim P{D, < zn— 1/24 = L(z), (6) 


n> co 


where 
@)=1-290C1 ieee, (7) 


Theorem 3 can be used to find d,, such that lim,_,., P{./nD, < d,.} = 1— a. Tables 
of d, for various values of a are also available in Owen [79]. 

The statistics D;* and D, have the same distribution because of symmetry, and their 
common distribution is given by the following theorem. 


Theorem 4. Let F be a continuous DF. Then 


ifz<0, 
[ J. 1)/n , ho n 
P(DS <3} = ee (8) 
af f (uy, U2,---5Un) iia if0<z<l, 
(1/n)—z i=1 
1 ifz>1, 


where f is given by (5). 
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Proof. We leave the reader to prove Theorem 4. 


Tables for the critical values D{,,, where P{D} > Df, } < a, are also available for 
selected values of n and a; see Birnbaum and Tingey [8]. Table ST7 at the end of this 
book gives D{, and D,,,.. for some selected values of n and a. For large samples Smirnov 


nia 


[108] showed that 


a 


lim Pi /nDt <z}=1—-e*, z>0. (9) 
n—-oo 


In fact, in view of (9), the statistic V, = 4nD+? has a limiting x? (2) distribution, for 
AnD}? < 42? if and only if /nD+ < z, z> 0, and the result follows since 


lim P{V,<2}=1-e-*,  z2>0, 


n—- oo 


so that 


lim P{V, <x}=1-— er, x>0, 


n->co 


which is the DF of a y7(2) RV. 


Example 1. Let a = 0.01, and let us approximate ae We have 3.0.01 = 9.21. Thus 
X, = 9.21, yielding 


pt... aa ot = 303 
n,0.01 ~~ An ~ 2/n’ 


If, for example, n = 9, then Dov = 3.03/6 = 0.50. Of course, the approximation is better 
for large n. 


The statistic D, and its one-sided analogs can be used in testing Ho: X ~ Fo against 
H,: X ~ F, where Fo(x) 4 F(x) for some x. 


Definition 2. To test Ho: F(x) = Fo(x) for all x at level a, the Kolmogorov-Smirnov test 
rejects Hp if D, > Dn, Similarly, it rejects F(x) > Fo(x) for all x if D > D;*, and rejects 
F(x) < Fo(x) for all x at level a if Dt > D* 


n,a* 


For large samples we can approximate by using Theorem 3 or (9) to obtain an 
approximate a-level test. 


Example 2. Let us consider the data in Example 10.3.3, and apply the Kolmogorov— 
Smirnov test to determine the goodness of the fit. Rearranging the data in increasing order 
of magnitude, we have the following result: 
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Xx F(x) F3(x) i/20— F(x) F(x) —(i-1) /20 


—1.787 0.0367 5 0.0133 0.0367 
-1.229 0.1093 = —0.0093 0.0593 
—0.525 0.2998 3 —0.1498 0.1998 
—0.513 0.3050 # —0.1050 0.1550 
—0.508 0.3050 # —0.0550 0.1050 
—0.486 0.3121 £ —0.0121 0.0621 
—0.482 0.3156 0.0344 0.0156 
—0.323 0.3745 Jf 0.0255 0.0245 
—0.261 0.3974 F 0.0526 —0.0026 
—0.068 0.4721 #% 0.0279 0.0221 
—0.057 0.4761 4 0.0739 —0.0239 
0.137 0.5557 3 0.0443 0.0057 
0.464 0.6772 8 —0.0272 0.0772 
0.595 0.7257 4 —0.0257 0.0757 
0.881 0.8106 3 —0.0606 0.1106 
0.906 0.8186 58 —0.0186 0.0686 
1.046 0.8531 1 —0.0031 0.0531 
1.237 0.8925 38 0.0075 0.0425 
1.678 0.9535 3 —0.0035 0.0535 
2.455 0.9931 1 0.0069 0.0431 


From Theorem 1, 
D5) = 9.1998, De = 0.0739, and Doy = max(D;,,Da,) = 0.1998. 


Let us take a = 0.05. Then D29,0.05 = 0.294. Since 0.1998 < 0.294, we accept Hp at the 
0.05 level of significance. 


It is worthwhile to compare the chi-square test of goodness of fit and the Kolmogorov— 
Smirnov test. The latter treats individual observations directly, whereas the former 
discretizes the data and sometimes loses information through grouping. Moreover, the 
Kolmogorov—Smirnov test is applicable even in the case of very small samples, but the 
chi-square test is essentially for large samples. 

The chi-square test can be applied when the data are discrete or continuous, but the 
Kolmogorov—Smirnov test assumes continuity of the DF. This means that the latter test 
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provides a more refined analysis of the data. If the distribution is actually discontinuous, 
the Kolmogorov—Smirnov test is conservative in that it favors Ho. 

We next turn our attention to some other uses of the Kolmogorov—Smirnov statistic. 
Let X),X2,...,X, be a sample from a DF F, and let F* be the sample DF. The estimate F; 
of F for large n should be close to F. Indeed, 


r{ vie)—rei< Q Fat FO a a0 
and, since F(x)[1 — F(x)] < }, we have 
P{lrs(s) -Fell s tobe 1-5 any 


Thus F; can be made close to F with high probability by choosing 4 and large enough n. 
The Kolmogorov—Smirnov statistic enables us to determine the smallest n such that the 
error in estimation never exceeds a fixed value € with a large probability | — a. Since 


P{D, <e}>1-a, (12) 


€ = Dy,q; and, given € and a, we can read n from the tables. For large n, we can use the 
asymptotic distribution of D, and solve dy = €,/n for n. 
We can also form confidence bounds for F. Given a and n, we first find D,,. such that 


P{Dy > Dna} < o, (13) 


which is the same as 


P{ sup F(x) — FC) < Pro} >l-a. 


Thus 
P{|Fi (x) — F(x)| < Dp for allx} > 1—a. (14) 
Define 
L,(x) = max{F7 (x) — Dna, 0} (15) 
and 
U,(x) =min{ F(x) + Dyas 1}. (16) 


Then the region between L,,(x) and U,,(x) can be used as a confidence band for F(x) with 
associated confidence coefficient 1 — a. 


Example 3. For the data on the standard normal distribution of Example 2, let us form 
a 0.90 confidence band for the DF. We have D20,9.19 = 0.265. The confidence band is, 
therefore, F'3)(x) + 0.265 as long as the band is between 0 and 1. 
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13.3.2 Problem of Location 


Let X1,X2,...,X, be a sample of size n from some unknown DF F. Let p be a positive 
real number, 0 < p < 1, and let 3,,(F’) denote the quantile of order p for the DF F. In the 
following analysis we assume that F is absolutely continuous. The problem of location 
is to test Ho: 3,(F) = 30, 30 a given number, against one of the alternatives 3,(F') > 30, 
3p < 30, and 3, A 30. The problem of location and symmetry is to test Hj: 30.5(F) = 30. 
and F is symmetric against H{ : 30.5(F) 4 Co or F is not symmetric. 

We consider two tests of location. First, we describe the sign test. 


13.3.2.1 The Sign Test Let X,,X>,...,X, be iid RVs with common PDF f. Consider 
the hypothesis testing problem 


Ho: 3p(f)=30 against A: 3p(f) > 30, (17) 


where 3,(f) is the quantile of order p of PDF f, 0 < p < 1. Let g(F) = P(X > 30) = 
P(X; — 30 > 0). Then the corresponding U-statistic is given by 


nU(X) =R*(X), 


the number of positive elements in X; — 30, X2 — 30,---,Xn — 30- Clearly, P(X; = 30) = 0. 
Fraser [32, pp. 167-170] has shown that a UMP test of Ho against H, is given by 


1, Rt(x)><¢, 
(x)= 7, R(x) =c, (18) 
0, R(x) <c, 


n a x n—. + x n c,n—c 
ape (ern 0-2 a +o(")a-p¥ (19) 


Note that, under Hp, 3,(f) = 30, so that Py,(X < 30) =p and Rt (X) ~ b(n, 1—p). The 
same test is UMP for Ho: ap(f) < 30 against Hy: ap(f) > 30. For the two-sided case, 
Fraser [32, p. 171] shows that the two-sided sign test is UMP unbiased. 

If, in particular, 3o is the median of f, then p = 1/2 under Hp. In this case one can also 
use the sign test to test Hy: med(X) = 30, F is symmetric. 

For large n one can use the normal approximation to binomial to find c and y in (19). 


Example 4. Entering college freshmen have taken a particular high school achievement 
test for many years, and the upper quartile (p = 0.75) is well established at a score of 195. 
A particular high school sent 12 of its graduates to college, where they took the examina- 
tion and obtained scores of 203, 168, 187, 235, 197, 163, 214, 233, 179, 185, 197, 216. 
Let us test the null hypothesis Ho that 39.75 < 195 against Hj: 30.75 > 195 at the a = 0.05 
level. 
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We have to find c and y such that 


EQ O' HC)G "= 


From the table of cumulative binomial distribution (Table ST1) for n = 12, p= i we see 
that c = 6. Then ¥ is given by 


Thus 
0.0358 
= —— = 0.89. 
0.0402 
In our case the number of positive signs, x; — 195,i=1,2,...,12, is 7, so we reject Ho 


that the upper quartile is <195. 


Example 5. A random sample of size 8 is taken from a normal population with mean 0 
and variance 1. The sample values are —0.465, 0.120, —0.238, —0.869, —1.016, 0.417, 
0.056, 0.561. Let us test hypothesis Hp: 4 = —1.0 against H,: 4 > —1.0. We should 
expect to reject Hy since we know that it is false. The number of observations, x; — uo = 
x; + 1.0, that are > 0 is 7. We have to find c and y such that 


2 () (3) ++(3) (5) = 0.05, say, 
y (;) +7(2) = i722, 


We see that c = 6 and y = 0.13. Since the number of positive x; — jig is > 6, we reject Ho. 
Let us now apply the parametric test here. We have 


that is, 


eae ECP 
8 
Since o = 1, we reject Hp if 
X > plo + : 1.0 : 1.64 
x a= t : 
Ho va | VB 
= —0.42. 


Since —0.179 > —0.42, we reject Ho. 
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The single-sample sign test described above can easily be modified to apply to sampling 
from a bivariate population. Let (X1, Y,),(X2, Y2),.-.,(Xn, Y,) be arandom sample from a 
bivariate population. Let Z; = X;— Y;, i= 1,2,...,n, and assume that Z; has an absolutely 
continuous DF. Then one can test hypotheses concerning the order parameters of Z by 
using the sign test. A hypothesis of interest here is that Z has a given median 39. Without 
loss of generality let 39 = 0. Then Ho: med(Z) = 0, that is, P{Z > 0} = P{Z <0} = 5. 
Note that med(Z) is not necessarily equal to med(X) — med(Y), so that Hp is not 
that med(X) = med(Y) but that med(Z) = 0. The sign test is UMP against one-sided 
alternatives and UMP unbiased against two-sided alternatives. 


Example 6. We consider an example due to Hahn and Nelson [40], in which two measur- 
ing devices take readings on each of 10 test units. Let X and Y, respectively, be the readings 
on a test unit by the first and second measuring devices. Let X =A+e,;, Y=A+€z, 
where A, €1, €2, respectively, are the contributions to the readings due to the test unit and 
to the first and the second measuring devices. Let A, €1, €2 be independent with EA = p, 
var(A) = 02, Ee, = Ee, = 0, var(e,) = 07, var(€2) = 03, so that X and Y have common 
mean 1 and variances 07 + 0? and o5 +02, respectively. Also, the covariance between X 
and Y is o2. The data are as follows: 


Test unit 


First device, X 71 108 72 140 61 97 90 127 101 114 
Second device, ¥ 77 105 71 152 88 117 93 130 112 105 
Z=X-—Y —6 3 1 8 17 20 3 3 11 9 


Let us test the hypothesis Hp: med(Z) = 0. The number of Z;’s > 0 is 3. We have 


3 10\ /1\ 
P{number of Z;’s > 0 is <3| Ho} = S- (“) (5) 
k=0 
= 0.172. 


Using the two-sided sign test, we cannot reject Ho at level a = 0.05, since 0.172 > 0.025. 
The RVs Z; can be considered to be distributed normally, so that under Hp the common 
mean of Z;’s is 0. Using a paired comparison f-test on the data, we can show that tf = —0.88 
for 9 d.f., so we cannot reject the hypothesis of equality of means of X and Y at level 
a = 0.05. 


Finally, we consider the Wilcoxon signed-ranks test. 
13.3.2.2. The Wilcoxon Signed-Ranks Test The sign test for median and symmetry 


loses information since it ignores the magnitude of the difference between the observa- 
tions and the hypothesized median. The Wilcoxon signed-ranks test provides an alternative 
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test of location (and symmetry) that also takes into account the magnitudes of these 
differences. 

Let X1,X2,...,X;, be iid RVs with common absolutely continuous DF F, which is sym- 
metric about the median 3;/2. The problem is to test Ho: 31/2 = 30 against the usual 
one- or two-sided alternatives. Without loss of generality, we assume that 39 = 0. Then 
F(—x) = 1— F(x) for all x € ®. To test Hy: F(O) = 5 or 31/2 = 0, we first arrange 
|X|, |Xo|,...,|X,| in increasing order of magnitude, and assign ranks 1,2,...,7, keeping 
track of the original signs of X;. For example, if n = 4 and |X| < |X4| < |Xi| < |X3|, the 
rank of |X,| is 3, of |X2| is 1, of |X3| is 4, and of |Xa| is 2. 


Let 
T* =the sum of the ranks of positive X;’s, (20) 
T~ =the sum of the ranks of negative X;’s. 
Then, under Ho, we expect T* and T~ to be the same. Note that 
rear =n Met) (21) 
= = 
so that T+ and T~ are linearly related and offer equivalent criteria. Let us define 
1 ifx;>0 
ae > ttt (22) 
0 ifX;<0 
and write R(|X;|) = R* for the rank of |X;|. Then T+ = S0"_,R7Z, and T7 = 
i (1 - ZRF. Also, 
Tt-T =-) oR} +25 ZRF 
i=1 i=1 
” n(n+1) 
=2S RtZ,-———. 23 
Soa 3 3) 


i=1 


The statistic T* (or T~) is known as the Wilcoxon statistic. A large value of T* (or, 
equivalently, a small value of T~) means that most of the large deviations from 0 are 
positive, and therefore we reject Ho in favor of the alternative, Hy: 3) /2 > 0. 

A similar analysis applies to the other two alternatives. We record the results as follows: 


Test 
Ho Ay Reject Ap if 
31/2=9 312>0 Tt >¢ 
31/72 =9 31/2 <0 Tt <e 


31/2=0 31/240 Tt <c3orTt > 
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We now show how the Wilcoxon signed-ranks test statistic is related to the U-statistic 
estimate of go(F) = Pr(X; +X > 0). Recall from Example 13.2.6 that the corresponding 
U-statistic is 


-1 
u(x) = (5) eit (24) 


1<i<j<n 


First note that 


Yo Itxysa => oloysat SS Ixtxo- (25) 
j=l 


1<i<j<n 1<i<j<n 


Next note that for i <j, X(j) + Xj) > 0 if and only if X(j) > 0 and |X| < |X|. It follows 
that oe Tix) +X) >0] 18 the signed-rank of X(;). Consequently, 


n J 
eS xo +xo>4 = 2: Tix +x)>0) 


j=l i=l 1<i<j<n 


= i + Tix,4x)>0] 
jel 


1<i<j<n 
n 


nies (? 


) u00) (26) 


where Uj is the U-statistic for g,(F) = Pr(X, > 0). 

We next compute the distribution of T* for small samples. The distribution of T* is 
tabulated by Kraft and Van Eeden [55, pp. 221-223]. 

Let 


= 1 if the |X;| that has rank iis >0 
0. otherwise. 


Note that 7* = 0 if all differences have negative signs, and T+ =n(n+1)/2 if all differ- 
ences have positive signs. Here a difference means a difference between the observations 
and the postulated value of the median. T* is completely determined by the indicators Zvi), 
so that the sample space can be considered as a set of 2” n-tuples (z1,Z2,.--,Zn), Where 
each z; is 0 or 1. Under Ho, 3; 2 = 30 and each arrangement is equally likely. Thus 


{number of ways to assign + or — signs to 
integers 1,2,...,n so that the sum is r} 
Qn 

n(t) 


= hn? SAY- (27) 


Pit == 
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Note that every assignment has a conjugate assignment with plus and minus signs 
interchanged so that for this conjugate, T* is given by 


n 1 
S_i(1— Zw) = nn )_ SZ wi. (28) 


1 1 


Thus under Hp the distribution of T* is symmetric about the mean n(n + 1) /4. 


Example 7, Let us compute the null distribution for n = 3. Ey, Tt =n(n+ 1)/4 = 3, and 
T* takes values from 0 to n(n+1)/2 =6: 


Ranks Associated with 
Value of T* Positive Differences —_n(f) 


1, 2,3 
2,3 
1,3 
1,2;3 


whan Dd 
No | = 


so that 


t=4,5,6,0,1,2, 
f=3, (29) 
otherwise. 


Py {Tt =t}= 


S city ole 


Similarly, for n = 4, one can show that 


t=0,1,2,8,9, 10, 
t= 3,4,5,6,7, (30) 
otherwise. 


ale al- 


Py {Tt — t} = 


0 


An alternative procedure would be to use the MGF technique. Under Hp, the RVs iZ(;) 
are independent and have the PMF 


PLiZ (i) = i} = P{iZ(j) = 0} = 5. 


Thus 


“fet+i 
=I ; Ji G1) 
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We express M(t) as a sum of terms of the form aje”/2”. The PMF of T* can then be 
determined by inspection. For example, in the case n = 4, we have 


m= TT ($4) _ (<2) (eH) (CH) (“H) 


Br] At | 
=Hererern(' _ y( = ) (32) 
2 2 
At 1 
Hite eee ee +1) (SF ) (33) 
= £ (el +e% + e% +26” + 2e% + 2e% + 2e% + 2e% +e +e' +1). (34) 


This method gives us the PMF of T+ for n = 2, n = 3, and n = 4 immediately. Quite 
simply, 


Py,{T* =j} = coefficient of e” in the expansion of M(t), j = 0, (35) 
1,...,n(n+1)/2. 


See Problem 3.3.12 for the PGF of T*. 


Example 8. Let us return to the data of Example 5 and test Ho: 31/2 = 4 = —1.0 against 
Ay: 31/2 > —1.0. Ranking |x; — 31/2] in increasing order of magnitude, we have 


0.016 < 0.131 < 0.535 < 0.762 < 1.056 < 1.120 < 1.417 < 1.561 


5 4 1 3 7 2 6 8 
Thus 
r, =3, r, = 6, r3 =4, rq ; 
rs =1, ae r7=5, r= 
and 


TT =34+644424+74548 =35. 
From Table ST10, Hp is rejected at level w = 0.05 if T* > 31. Since 35 > 31, we reject Ho. 
Remark I, The Wilcoxon test statistic can also be used to test for symmetry. Let 


X1,X2,...,X, be tid observations on an RV with absolutely continuous DF F. We set the 
null hypothesis as 


Ao: 31/2 = 30, and DF F is symmetric about 30. 


The alternative is 


Ay: 31/2 #30 and F symmetric, or F asymmetric. 


The test is the same since the null distribution of T+ is the same. 
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Remark 2. If we have n independent pairs of observations (X,,Y,), (Xo, Y2),,---; (Xn; Yn) 
from a bivariate DF, we form the differences Z; = X; — Y;, i= 1,2,...,n. Assuming that 
Z|,Z2,..-,Zy are (independent) observations from a population of differences with abso- 
lutely continuous DF F that is symmetric with median 3;/2, we can use the Wilcoxon 
statistic to test Ho: 31/2 = 3o- 


We present some examples. 
Example 9. For the data of Example 10.3.3 let us apply the Wilcoxon statistic to test 
Ao: 31/2 = 0 and F is symmetric against H,: 3,/2 #0 and F symmetric or F not 


symmetric. 


The absolute values, when arranged in increasing order of magnitude, are as follows: 


0.057 < 0.068 < 0.137 < 0.261 < 0.323 < 0.464 < 0.482 < 0.486 < 0.508 < 0.513 


13 5 2 17 4 1 11 15 20 7 
< 0.525 < 0.595 < 0.881 < 0.906 < 1.046 < 1.229 < 1.237 < 1.678 < 1.787 < 2.455 
8 9 10 6 19 14 18 12 16 3 
Thus 


r) = 6, ro = 3, r3 = 20, ri =D; rs = 2, ro = 14, 
r77=10, rgp=11, r= 12, mo=13, rm=7, ro=18, 
rga=1, r4=16, m5=8, rie =19, ri7=4, rig =17, 


rig =15, 120 =9, 
and 


TT =64+34+20+4+ 14412+13+18+17+15=118. 


From Table ST10 we see that Hp cannot be rejected even at level a = 0.20. 


Example 10. Returning to the data of Example 6, we apply the Wilcoxon test to the dif- 
ferences Z; = X; — Y;. The differences are —6, 3, 1, —8, —17, —20, —3, —3, —11, 9. To test 
Ao: 31/2 = 0 against Hy : 31/2 #0, we rank the absolute values of z; in increasing order to 
get 


1<3=3=3<6<8<9<11<17<20 


and 
Tt =1424+7=10. 


Here we have assigned ranks 2, 3, 4 to observations +3, —3, —3. (If we assign rank 4 to 
observation 3, then 7* = 12 without appreciably changing the result.) 

From Table ST10, we reject Ho at « = 0.05 if either T* > 46 or Tt <9. Since T* > 9 
and < 46, we accept Hp. Note that hypothesis Hp was also accepted by the sign test. 
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For large samples we use the normal approximation. In fact, from (26) we see that 


n(T+—ET*+) — n3/? 
an i ) = ——(U, — EU\) + V/n(U2 — EU). 
(3) (3) 
Clearly, U; — EU, "50 and since n3/? (5) — 0, the first term — 0 in probability as 
n—> oo. By Slutsky’s theorem (Theorem 7.2.15) it follows that 

VA ort + 

+~(T*—ETt) and /n(U2—EU2) 

(3) 
have the same limiting distribution. From Theorem 13.2.3 and Example 13.2.7 it follows 
that \/n(U2 — EU2), and hence (T* — ET*),/n/ (5), has a limiting normal distribution 
with mean 0 and variance 


AC, = 4P p(X; +Xo > 0,X; + X3 > 0) —4P2(X, +. Xp > 0). 


Under Ho, the RVs iZ;;) are independent b(1,1/2) so 


n(n+1) 1 I\ Qo, n(nt+1)(2n4+1) 
r= 282 me mar =(2) (Sermon 


Also, under Ho, F is continuous and symmetric so 


Pr(X,+X2 > 0) = ix Pr(X1 > —x)f (x)dx = : 
and 
Pr(Xi +X > 0% +%3>0)= [ (Pe(Xi > 9 PFa)ae= 5 
Thus 4¢, = 4/3 —4/4 = 1/3 so that 
(ibe) —+N(0,1). 


(Vz 
However, 


(vary,T+)!/? [n(n +1)(2n4+-4)/24]!/2 


as n —> oo. Consequently, under Ho 


oe pat eee . 


4.” 24 
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Thus, for large enough n we can determine the critical values for a test based on T* by 
using normal approximation. 

As an example, take n = 20. From Table ST10 the P-value associated with tt = 140 is 
0.10. Using normal approximation 


140 — 105 


Py,(T* > 140) = P(Z 
tn ) (z> 27.45 


) = P(Z > 1.28) = 0.10003 


PROBLEMS 13.3 


1. Prove Theorem 4. 

2. A random sample of size 16 from a continuous DF on (0, 1] yields the following 
data: 0.59, 0.72, 0.47, 0.43, 0.31, 0.56, 0.22, 0.90, 0.96, 0.78, 0.66, 0.18, 0.73, 0.43, 
0.58, 0.11. Test the hypothesis that the sample comes from U(0, 1]. 

3. Test the goodness of fit of normality for the data of Problem 10.3.6, using the 
Kolmogorov—Smirnov test. 

4. For the data of Problem 10.3.6 find a 0.95 level confidence band for the distribution 
function. 

5. The following data represent a sample of size 20 from U{0, 1]: 0.277, 0.435, 0.130, 
0.143, 0.853, 0.889, 0.294, 0.697, 0.940, 0.648, 0.324, 0.482, 0.540, 0.152, 0.477, 
0.667, 0.741, 0.882, 0.885, 0.740. Construct a .90 level confidence band for F(x). 

6. In Problem 5 test the hypothesis that the distribution is U[0, 1]. Take a = 0.05. 

7. For the data of Example 2 test, by means of the sign test, the null hypothesis 
Ho: w= 1.5 against H,: w 1.5. 

8. For the data of Problem 5 test the hypothesis that the quantile of order p = 0.20 
is 0.20. 

9. For the data of Problem 10.4.8 use the sign test to test the hypothesis of no 
difference between the two averages. 

10. Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no 
difference in grade-point averages. 

11. For the data of Problem 5 apply the signed-rank test to test Ho: 31/2 = 0.5 against 
Ay: 31/2 # 0.5. 

12. For the data of Problems 10.4.8 and 10.4.9 apply the signed-rank test to the 
differences to test Ho: 31/2 = 0 against Hy: 31/2 #0. 


13.4 SOME TWO-SAMPLE PROBLEMS 


In this section we consider some two-sample tests. Let X,,X2,...,Xm and Y;, Y2,...,Y, be 
independent samples from two absolutely continuous distribution functions Fy and Fy, 
respectively. The problem is to test the null hypothesis Ho: Fy(x) = Fy(x) for all x E R 
against the usual one- and two-sided alternatives. 
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Tests of Hp depend on the type of alternative specified. We state some of the alternatives 
of interest even though we will not consider all of these in this text. 


I Location alternative: Fy(x) = Fxy(x— 0), 040. 
II Scale alternative: Fy(x) = Fx(x/a), 0 > 0. 
III Lehmann alternative: Fy(x) = 1—[1—Fy(x)]?*!, 0+1>0. 
IV Stochastic alternative: Fy(x) > Fx(x) for all x, and Fy(x) > Fy(x) for at least one x. 
V General alternative: Fy (x) 4 Fx(x) for some x. 


Some comments are in order. Clearly I through IV are special cases of V. Alternatives I 
and II show differences in Fy and Fy in location and scale, respectively. Alternative III 
states that P(Y > x) = [P(X > x)]°*!. In the special case when @ is an integer it states that 
Y has the same distribution as the smallest of the 9+ 1 of X-variables. A similar alternative 
to test that is sometimes used is Fy(x) = [Fy(x)]% for some a > 0 and all x. When a is an 
integer, this states that Y is distributed as the largest of the a X-variables. Alternative IV 
refers to the relative magnitudes of X’s and Y’s. It states that 


P(Y <x) >P(X <x) forall, 
so that 
POY Sa) PX > x); (1) 
for all x. In other words, X’s tend to be larger than the Y’s. 


Definition 1. We say that a continuous RV X is stochastically larger than a continuous 
RV Y if inequality (1) is satisfied for all x with strict inequality for some x. 


A similar interpretation may be given to the one-sided alternative Fy > Fy. In the spe- 
cial case where both X and Y are normal RVs with means {1 , 42 and common variance 7, 
Fy = Fy corresponds to jt; = 2 and Fy > Fy corresponds to ju) < [2 

In this section we consider some common two-sample tests for location (Case I) and 
stochastic ordering (Case IV) alternatives. First, note that a test of stochastic ordering 
may also be used as a test of less restrictive location alternatives since, for example, 
Fy > Fy corresponds to larger Y’s and hence larger location for Y. Second, we note that 
the chi-square test of homogeneity described in Section 10.3 can be used to test general 
alternatives (Case V) H : F(x) 4 G(x) for some x. Briefly, one partitions the real line into 
Borel sets A;,Az2,...,Ax. Let 


Pil = P(X; € Aj) and Pin = P(Y; € Ai), 


i=1,2,...,k. Under Hp: F=G, pi = pi, i=1,2,...,k, which is the problem of testing 
equality of two independent multinomial distributions discussed in Section 10.3. 

We first consider a simple test of location. This test, based on the sample median of the 
combined sample, is a test of the equality of medians of the two DFs. It will tend to accept 
Ho : F = G even if the shapes of F and G are different as long as their medians are equal. 
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13.4.1 Median Test 


The combined sample X),X2,...,Xm, Yi, Y2,-..,Y, is ordered and a sample median is 
found. If m-+-n is odd, the median is the [(m-+n+ 1) /2]th value in the ordered arrange- 
ment. Jf m+n is even, the median is any number between the two middle values. Let V be 
the number of observed values of X that are less than or equal to the sample median for the 
combined sample. If V is large, it is reasonable to conclude that the actual median of X is 
smaller than the median of Y. One therefore rejects Hp: F = Gin favor of H,: F(x) > G(x) 
for all x and F(x) > G(x) for some x if V is too large, that is, if V > c. If, however, the 
alternative is F(x) < G(x) for all x and F(x) < G(x) for some x, the median test rejects Ho 
ifV<c. 

For the two-sided alternative that F(x) 4 G(x) for some x, we use the two-sided test. 

We next compute the null distribution of the RV V. If m+n = 2p, p a positive integer, 
then 


Pa {V =v} = Pp, {exactly v of the X;’s are < combined median} 


= ("*") ; oe (2) 
P 


0, otherwise. 


Here 0 < V < min(m,p). If m+n = 2p +1, p > 0, is an integer, the [(m+n-+ 1)/2]th 
value is the median in the combined sample, and 


Py{V = v} = P{exactly v of the X;’s are below the (p + 1)th value 


in the ordered arrangement} 


_ ee v=0,1,...,min(m,p), (3) 
Pp 


0, otherwise. 


Remark 1. Under Ho we expect (m+n) /2 observations above the median and (m+n) /2 
below the median. One can therefore apply the chi-square test with | d.f. to test Hp against 
the two-sided alternative. 


Example 1. The following data represent lifetimes (hours) of batteries for two different 
brands: 


Brand A: 40 30 40 45 55 30 
BrandB: 50 50 45 55 60 40 
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The combined ordered sample is 30, 30, 40, 40, 40, 45, 45, 50, 50, 55, 55, 60. Since 
m+n = 3 is even, the median is 45. Thus 


v = number of observed values of X that are less than or equal to 45 
=5. 


Now 


(HC) @@) 
5/\1 6/ \0 
oe = RU. . 
PulV 2 5} mt A 0.04 
6 6 
Since Py, {V > 5} > 0.025, we cannot reject Hp that the two samples come from the same 
population. 


We now consider two tests of the stochastic alternatives. As mentioned earlier they may 
also be used as tests of location. 


13.4.2 Kolmogorov—Smirnov Test 


Let X,,X2,...,Xm and Y;, Yo,..., Y,, be independent random samples from continuous DFs 
F and G, respectively. Let F* rl G;,, respectively, be the empirical DFs of the X’s and 
the Y’s. Recall the F*, is the U-statistic for F and G*, that for G. Under Ho: F(x) = G(x) 
for all x, we expect a reasonable agreement between the two sample DFs. We define 


m n? 


Dn Au up Fi, (x ) —G, (x) || (4) 


Then D,,,, may be used to test Ho against the two-sided alternative H;: F(x) 4 G(x) for 
some x. The test rejects Ho at level a if 


Din,n 2 Din,n,ovs (5) 


where PH {Dinn = Dinn,ot < Qa. 
Similarly, one can define the one-sided statistics 


7, = sup[Fn(x) — Gr (%)] (6) 


and 


to be used against the one-sided alternatives 


G(x) < F(x) for all x and G(x) < F(x) for some x (8) 
>D* 


min — m,n, 


with rejection region D> 
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and 


F(x) < G(x) for all x and = F(x) < G(x) for some x 


with rejection region D,, ,, > D,, 


myn — ““m,n,a? 


(9) 


respectively. 

For small samples tables due to Massey [72] are available. In Table ST9, we give the 
values of Dyn, and Dy ic for some selected values of m,n, and a. Table ST8 gives the 
corresponding values for the m = n case. 

For large samples we use the limiting result due to Smirnov [107]. Let N =mn/(m+n). 


Then 


lee, ASO 
lm P{/ND>,<A}= : 1 
Reon {VN Tg A} tb A< 0, 10) 
—1 ig P| A> 0, 
lim P{VNDinn < A} = 2 (11) 
m,n—oo , ~~ 
0, <0. 


Relations (10) and (11) give the distribution of De and Dn», tespectively, under 
Ho: F(x) = G(x) for allx ER. 


Example 2. Let us apply the test to data from Example 1. Do the two brands differ with 
respect to average life? 

Let us first apply the Kolmogorov—Smirnov test to test Ho that the population distribu- 
tion of length of life for the two brands is the same. 


x Fe(x) Ge(x) Fe (x) — Ge(x)| 
302 0 2 
o § 3 3 
6 §  G 
nn 6 
55 ; i 
60 1 1 0 


2 * 3 
Doo = sup, |F¢ (x) — Gg (x)| = 6 


From Table ST8, the critical value for m = n = 6 at level a = 0.05 is D¢,6.0.05 = 2. 
Since Do 6 # Do,6,0.05, We accept Ho that the population distribution for the length of life 
for the two brands is the same. 
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Let us next apply the two-sample f-test. We have x = 40, y = 50, st = 90, s3 = 50, 
ce = 70. Thus 


joe be 


VIO t+3 


Since t10,0.025 = 2.2281, we accept the hypothesis that the two samples come from the 
same (normal) population. 


The second test of stochastic ordering alternatives we consider is the Mann—Whitney— 
Wilcoxon test which can be viewed as a test based on a U-statistic. 


13.4.3 The Mann-Whitney—Wilcoxon Test 


Let X1,X,...,X and Y,,Y2,...,Y, be independent samples from two continuous DFs, 
F and G, respectively. As in Example 13.2.10, let 


i, eee 
0, if X; 2 Yj, 


for i = 1,2,...,m, j = 1,2,...,n. Recall that T(X;;¥;) is an unbiased estimator of 
g(F,G) = Pr.g(X < Y) and the two sample U-statistic for g is given by U|(X;Y) = 
(m,n)~' S¥i-1 >sj=1 T (Xi; Yj). For notational convenience, let us write 


m n 


U = mn (X;¥) = S° SO 7X; ¥)). (12) 
i=1 j=l 
Then U is the number of values of X,, X2,...,X; that are smaller than each of Y,, Y2,..., Yn. 


The statistic U is called the Mann-Whitney statistic. An alternative equivalent form using 
Wilcoxon scores is the linear rank statistic given by 


W=) 9, (13) 
j=l 


where Q; = rank of Y; among the combined m + n observations. Indeed, 
Q; = rank of Y; = (# of X;’s < Y;) + rank of Y; in Y’s. 


Thus 


OS roe mr eauaue! (14) 


so that U and W are equivalent test statistics. Hence the name Mann—Whitney—Wilcoxon 
Test. We will restrict attention to U as the test statistic. 
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Example 3. Let m = 4, n = 3, and suppose that the combined sample when ordered is as 
follows: 


X2< xX <3 <2 << HX < YW <3. 


Then U = 7, since there are three values of x < y;, two values of x < y2, and two values 
of x < y3. Also, W = 13 so U = 13—3(4)/2 =7. 


Note that U = 0 if all the X;’s are larger than all the Y;’s and U = mn if all the X;’s are 
smaller than all the Y;’s, because then there are m X’s < Y;, m X’s < Yp, and so on. Thus 
0 <U< mn. If U is large, the values of Y tend to be larger than the values of X (Y is 
stochastically larger than X), and this supports the alternative F(x) > G(x) for all x and 
F(x) > G(x) for some x. Similarly, if U is small, the Y values tend to be smaller than the X 
values, and this supports the alternative F(x) < G(x) for all x and F(x) < G(x) for some x. 
We summarize these results as follows: 


Ho A, Reject Ho if 


F=G F<G U<c2 


F=G FAG U>c30rU< cq 


To compute the critical values we need the null distribution of U. Let 
Pinn (ut) = Py {U =u}. (15) 


We will set up a difference equation relating Dyn tO Pm—1,n and Pm »—1. If the observations 
are arranged in increasing order of magnitude, the largest value can be either an x value or 
a y value. Under Hp, all m+n values are equally likely, so the probability that the largest 
value will be an x value is m/(m--n) and that it will be a y value is n/(m-+-n). 

Now, if the largest value is an x, it does not contribute to U, and the remaining m— | 
values of x and n values of y can be arranged to give the observed value U = u with 
probability p,,—1,,(u). If the largest value is a Y, this value is larger than all the m x’s. Thus, 
to get U = u, the remaining n — | values of Y and m values of x contribute U = u—m. It 
follows that 


Pmn (u) = —Pm-1,n (u) at ——Pm,n-1 (u Va m). (16) 
nN m n 


If m= 0, then forn > 1 


1 ifu=0, 
n = : 17 
Pown(u) {) ifu>O. me 
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Ifn=0,m > 1, then 


1 ifu=0 
m = : 18 
Pn.o(W) {6 ifu>0, ue) 
and 
Pinn(u) =0 ifu<0, m>0, n>0O. (19) 


For small values of m and n one can easily compute the null PMF of U. Thus, if m = 
n= 1, then 


If m= 1,n=2, then 


pi2(0) = pia(1) = pi2(2) = 5. 


Tables for critical values are available for small values of m and n, m < n. See, for 
example, Auble [3] or Mann and Whitney [71]. Table ST11 gives the values of u,, for 
which Py, {U > ua} < a for some selected values of m,n, and a. 

If m,n are large we can use the asymptotic normality of U. In Example 13.2.11 we 
showed that, under Ho, 


U/(mn) — 5 


/(m+n+1)/(12mn) ee 


as m,n —> co such that m/(m-+n) — constant. The approximation is fairly good for 
m,n > 8. 


Example 4. Two samples are as follows: 


Values of Xj: 1,2,3,5,7,9, 11,18 
Values of Y;: 4,6,8, 10,12, 13,14, 15,19 


Thus m= 8,n=9, and U=3+4+5+4+6+74+7+7+7+8 = 54. The (exact) p-value 
Px,(U > 54) = 0.046, so we reject Ho at (two-tailed) level a = 0.1. Let us apply the 
normal approximation. We have 


8-9 8-9 
Ba gS, varp(U) = =5-(8 +9 +1) = 108, 
and 
54-36 = 18 
Z= = — = V3 = 1.732 
J108 3673 


We note that P(Z > 1.73) = 0.042. 
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PROBLEMS 13.4 


1. For the data of Example 4 apply the median test. 

2. Twelve 4-year-old boys and twelve 4-year-old girls were observed during two 
15-minute play sessions, and each child’s play during these two periods was scored 
as follows for incidence and degree of aggression: 


Boys: 86, 69,72, 65, 113,65, 118,45, 141, 104,41, 50 
Girls: 55,40, 22,58, 16,7,9, 16,26, 36,20, 15 


Test the hypothesis that there were sex differences in the amount of aggression 
shown, using (a) the median test and (b) the Mann-Whitney-Wilcoxon test (Siegel 
[105]). 

3. To compare the variability of two brands of tires, the following mileages (1000 
miles) were obtained for eight tires of each kind: 


Brand A: 32.1, 20.6, 17.8, 28.4, 19.6, 21.4, 19.9, 30.1 
Brand B: 19.8,27.6, 30.8, 27.6, 34.1, 18.7, 16.9, 17.9 


Test the null hypothesis that the two samples come from the same population, using 
the Mann—Whitney—Wilcoxon test. 

4. Use the data of Problem 2 to apply the Kolmogorov—Smirnov test. 

5. Apply the Kolmogorov—Smirnovy test to the data of Problem 3. 

6. Yet another test for testing Ho : F = G against general alternatives is the so-called 
runs test. A run is a succession of one or more identical symbols which are pre- 
ceeded and followed by a different symbol (or no symbol). The Jength of a run 
is the number of like symbols in a run. The total number of runs, R, in the com- 
bined sample of X’s and Y’s when arranged in increasing order can be used as a 
test of Hp. Under Hp the X and Y symbols are expected to be well-mixed. A small 
value of R supports H, : F 4 G. A test based on R is appropriate only for two-sided 


(general) alternatives. Tables of critical values are available. For large samples, one 
2mn  2mn(2mn—m—n) 
m+n? (m-+n—1)(m+n)? 


(a) Let R; = # of X-runs, R, = #Y-runs, and R = R; + Ro. Under Ho, show that 
m—1\ (n—1 
Coy (oa) 
“om 
where k = 2 ifr; = ro, = Lif |r; —ro| = 1,7; = 1,2,...,m andr. = 1,2,...,n. 
(b) Show that 


uses normal approximation: R ~ AN (1 


P(R; => r1,Ro = r2) =k 


Calew 


Pu (Ri =n) = ; 
0 (ae) 


O<r, <m. 
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7. Fifteen 3-year-old boys and 15 3-year-old girls were observed during two sessions 
of recess in a nursery school. Each child’s play was scored for incidence and degree 
of aggression as follows: 


Boys: 96 65 74 78 82 121 68 79 111 48 53 92 81 31 40 
Girls: 12 47 32 59 83 14 32 15 17 82 21 34 9 15 51 


Is there evidence to suggest that there are sex differences in the incidence and amount 
of aggression? Use both Mann—Whitney—Wilcoxon and runs tests. 


13.5 TESTS OF INDEPENDENCE 


Let X and Y be two RVs with joint DF F(x,y), and let F; and F, respectively, be 
the marginal DFs of X and Y. In this section we study some tests of the hypothesis of 
independence, namely, 


Ho: F(x,y) = Fi (x)F2(y) for all (x,y) € Ro 
against the alternative 
Hy: F(x,y) 4 Fy (x)F2(y) for some (x,y). 


If the joint distribution function F is bivariate normal, we know that X and Y are indepen- 
dent if and only if the correlation coefficient p = 0. In this case, the test of independence 
is to test Hyp: p=0. 

In the nonparametric situation the most commonly used test of independence is the 
chi-square test, which we now study. 


13.5.1 Chi-square Test of Independence—Contingency Tables 


Let X and Y be two RVs, and suppose that we have n observations on (X,Y). Let us 
divide the space of values assumed by X (the real line) into r mutually exclusive inter- 
vals A;,A2,...,A;. Similarly, the space of values of Y is divided into c disjoint intervals 
B,,B2,...,B-. As a tule of thumb, we choose the length of each interval in such a way 
that the probability that X(Y) lies in an interval is approximately (1/r)(1/c). Moreover, 
it is desirable to have n/r and n/c at least equal to 5. Let X,; denote the number of pairs 
(Xx, Yi), k = 1,2,...,n, that lie in A; x B;, and let 


pi = P{(X,Y) € A; x Bj} = P{X € Aj and Y € B;}, (1) 


where i= 1,2,...,r, j= 1,2,...,c. If each pj; is known, the quantity 


>> |) @ 


N 
i=1 j=1 Pij 
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has approximately a chi-square distribution with rc — 1 d.f., provided that n is large (see 
Theorem 10.3.2.). If X and Y are independent, P{ (X,Y) € A; x Bj} = P{X € A; }P{Y € B)}. 
Let us write p;. = P{X € A;} and p.; = P{Y © B;}. Then under Hp: py = p;-pj, i= 
1,2,...,r, j = 1,2,...,c. In practice, pj will not be known. We replace p; by their 
estimates. Under Ho, we estimate p;. by 


“_ Xi 
jy, = Diy i=1,2,...,7, (3) 


n 


and p.; by 
X; 
ye j= 1,2,...,¢. (4) 
i=l n 


Since os 1 Pj =1= >> Bi., we have estimated only r— 1+c—1=r-+c—2 parameters. 
It follows (see Theorem 10.3.4) that the RV 


U= >>|! Boca n) (5) 


n 
=i 1 Pi-Pj 


is asymptotically distributed as 7 with re— 1 — (r+c—2) =(r—1)(c—1) d-f., under Ho. 
The null hypothesis is rejected if the computed value of U exceeds Re aienihae 

It is frequently convenient to list the observed and expected frequencies of the rc events 
A; x Bj in an r x c table, called a contingency table, as follows: 


Observed Frequencies, Oj Expected Frequencies, Ej; 

B, Bo-+Be B, By ++: Be 
Ay Xi Xi Xie My AL MPL Pa MPLP.2° MPL-P.c MPA. 
Ay = Xp X72 +++ Xr¢ VX; Az np2.p.1 Np2.p.2°°*NP2.P.c  Npr. 


A, Xr X,2 a Xie pee A, Npr-P.A  NpPr-P.2°**NPr-P-c  Npr. 


Vix SOX So Xic n np. np.2 NP -c n 


Note that the X;;’s in the table are frequencies. Once the category A; x B; is determined 
for an observation (X,Y), numerical values of X and Y are irrelevant. Next, we need to 
compute the expected frequency table. This is done quite simply by multiplying the row 
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and column totals for each pair (i,/)) and dividing the product by n. Then we compute the 
quantity 


(Ey — Oy) 
rEg 


and compare it with the tabulated x? value. In this form the test can be applied even to 
qualitative data. Aj,A2,...,A,; and B,,B2,...,B, represent the two attributes, and the null 
hypothesis to be tested is that the attributes A and B are independent. 


Example 1. The following are the results for a random sample of 400 employed 
individuals: 


Length of time Annual Income (dollars) 


(years) with the Less than More than 

Same Company 40,000 40,000—75,000 75,000 Total 

mo 50 75 25 150 

5-10 25 50 25 100 

10 or more 25 75 50 150 
100 200 100 400 


If X denotes the length of service with the same company, and Y, the annual income we 
wish to test the hypothesis that X and Y are independent. The expected frequencies are as 
follows: 


Time (years) Expected Frequencies 


with the Same 


Company <40,000 40-75,000 >75,000 Total 
<<) 31:5 75 37:5 150 
5-10 25 50 25 100 
>10 S15 75 37.5 150 

100 200 100 400 
Thus 

(i25)> 0 (12.5) (12.5)° (125)° 
U= | 0+0+0 +0 
37.5 25 37.5 ae ” 37.5 * 37.5 


= 16.66. 


The number of degrees of freedom is (3 — 1)(3 — 1) = 4, and X4.0.05 = 9.488. Since 
16.66 > 9.488, we reject Ho at level 0.05 and conclude that length of service with a 
company is not independent of annual income. 
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13.5.2 Kendall’s Tau 
Let (X1,Y1), (Xo, Y2),---, (Xn, Y,) be a sample from a bivariate population. 


Definition 1. For any two pairs (X;,Y;) and (X;,Y;) we say that the relation is perfect 
concordance (or agreement) if 


X; < X; whenever Y; < Y; or X; > X; whenever Y; > Y; (6) 


and that the relation is perfect discordance (or disagreement) if 


X; > X; whenever Y; < Y; or X; < X; whenever Y; > Yj. (7) 


Writing 7, and 7 for the probability of perfect concordance and of perfect discordance, 
respectively, we have 


Te = P{(Xj — Xi) (Yj — ¥;) > OF (8) 
and 
ma = P{(X;—Xi)(¥j— Yi) < Of, (9) 
and, if the marginal distributions of X and Y are continuous, 
Me = [P{Yi < Yj} — P{X; > X; and Y; < Y;}] 
+ [P{Y; > ¥;} — P{X; < X; and ¥; > ¥;}] 
ie (10) 
Definition 2. The measure of association between the RVs X and Y defined by 
T=.—Tad (11) 


is known as Kendall’s tau. 


If the marginal distributions of X and Y are continuous, we may rewrite (11), in view 
of (10), as follows: 


T=1-—27ng=27,-1. (12) 
In particular, if X and Y are independent and continuous RVs, then 
P(X; <X}=P(X > X}=4, 
since then X; — X; is a symmetric RV. Then 


Te = PAX: < X)}PAY < ¥j} + P(X > XPPLY > Yi} 
= PIX) > X}P(Y, < Yi} + PUK <X}P{Y > Yi} 


= Td; 


and it follows that 7 = 0 for independent continuous RVs. 
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Note that, in general, 7 = 0 does not imply independence. However, for the bivariate 
normal distribution t = 0 if and only if the correlation coefficient p, between X and Y, 
is 0, so that 7 = 0 if and only if X and Y are independent (Problem 6). 

Let 


1, (v2 -—y1) (x2 —x1) > 0, 


13 
0, otherwise. a 


w((x1,91), (x2,y2)) = 


Then Ev ((X1, Yi), (X2, Y2)) = 7 = (1 +7) /2, and we see that 7, is estimable of degree 2, 
with symmetric kernel y defined in (13). The corresponding one-sample U-statistic is 
given by 


UKM) BotM=(S) WMH). 4 


1<i<j<n 
Then the corresponding estimator of Kendall’s tau is 
T=2U-1 (15) 


and is called Kendall’s sample correlation coefficient. 

Note that —1 < T < 1. To test Ho that X and Y are independent against H, : X and Y 
are dependent, we reject Hp if |T| is large. Under Ho, 7 = 0, so that the null distribution of 
T is symmetric about 0. Thus we reject Hp at level a if the observed value of T, ¢, satisfies 
|t| > taj2, where P{|T| > ta/2 | Ho} = a. 

For small values of n the null distribution can be directly evaluated. Values for 4 < 
n< 10 are tabulated by Kendall [51]. Table ST12 gives the values of Sy for which 
P{S > S..} <a, where S = (5)T for selected values of n and a. 

For a direct evaluation of the null distribution we note that the numerical value of T is 
clearly invariant under all order-preserving transformations. It is therefore convenient to 
order X and Y values and assign them ranks. If we write the pairs from the smallest to the 
largest according to, say, X values, then the number of pairs of values of 1 <i<j <n for 
which Y; — Y; > 0 is the number of concordant pairs, P. 


Example 2. Let n = 4, and let us find the null distribution of T. There are 4! different 
permutations of ranks of Y: 


Ranks of X values: 1 2 3 4 


Ranks of Y values: a, ad a3 a4 


where (a1, 42,43,a4) is one of the 24 permutations of 1,2,3,4. Since the distribution is 
symmetric about 0, we need only compute one half of the distribution. 
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P T Number of Permutations Py, {T = t} 
0 —1.00 1 54 
1 —0.67 3 s 
2 —0.33 5 x 
3. 0.00 6 8. 


Similarly, for n = 3 the distribution of T under Hp is as follows: 


P T Number of Permutations Py, {T = t} 
G6 =100 16 G20) 
i <08% -92(63,1),0,1,2) 


AIN Are 


Example 3. Two judges rank four essays as follows: 


Essay 
Judge 12 3 4 
1,X 3 4 2 1 
2,Y 3 1 4 2 


To test Hy: rankings of the two judges are independent, let us arrange the rankings of the 
first judge from | to 4. Then we have: 


Judge 1,X: 1 2 3 
Judge2,¥: 2 4 3 1 
P = number of pairs of rankings for Judge 2 such that for j > i, Y; — Y; > 0 = 2 [the pairs 
(2,4) and (2,3)], and 
2-2 


= —— -1=-0.33. 
t mn 0.33 
2 
Since 
18 
Py {|T| > 0.33} = 7A ='0:75, 


we cannot reject Ho. 


For large n we can use an extension of Theorem 13.3.3 to bivariate case to conclude 
that \/n(U — 7.) 4 N(0,4¢1), where 


C1 = cov {th ((X1, V1), (Xo, ¥o)) eh ((X1, V1), (Xs, ¥3))F 
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Under A, it can be shown that 


3¥n(n= VD pt, 00,1). 
2(2n+5) 


See, for example, Kendall [51], Randles and Wolfe [85], or Gibbons [35]. Approximation 
is good forn > 8. 


13.5.3. Spearman’s Rank Correlation Coefficient 
Let (X1, Yi), (Xo, Y2),.--,(Xn, Y,) be a sample from a bivariate population. In Section 6.3 
we defined the sample correlation coefficient by 

Leia ¥) 


R= 
{ony (Xi =xP (Yi aa yy} 


a (16) 


where 


n n 
X=n'S°X, and van Ye 
i=1 i=l 


If the sample values X1,X2,...,X, and Y;,Y2,...,¥, are each ranked from | to n in 
increasing order of magnitude separately, and if the X’s and Y’s have continuous DFs, we 
get a unique set of rankings. The data will then reduce to n pairs of rankings. Let us write 


Rj = rank(X;) and S$; = rank(Y¥;) 


then R; and S; € {1,2,...,n}. Also, 


R= y= et), a7) 


= 1 = 1 
Ran Sok =" ; San! Sos =" , (18) 
1 1 
and 
n -_ n _ n(n2 — | 
S>(Ri- RY = 5° (8;-S)P = " ) (19) 


~ n(n? —1) n—-1 — 


(20) 
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Writing D; = R; — S; = (R; — R) — (S; — S), we have 


ya} = ik — 3)? 28 - =R)(8,=5) 


and it follows that 


6 iD? 


R=1-—>.. 
n(n? — 1) 


(21) 


The statistic R defined in (20) and (21) is called Spearman’s rank correlation coefficient 
(see also Example 4.5.2). 
From (20) we see that 


12 ” 3(n+1) 
ER =—,_E cA pene ee 
n(n? — 1) (>: s) n—1 


(22) 


Under Hop, the RVs X and Y are independent, so that the ranks R; and S; are also 
independent. It follows that 


2 
i 
Ey, (RjS;) = ER:ES; = (5 ) 


and 


2 
12 (n+l 3(n+ 1) 
E_,R = = 0. 2 

os z4( 5) ) a-i (23) 


Thus we should reject Ho if the absolute value of R is large, that is, reject Ho if 
IR| > Ra, (24) 


where Px, {|R| > Ra} < a. To compute R, we need the null distribution of R. For this 
purpose it is convenient to assume, without loss of generality, that Rj =i, i= 1,2,...,n 
Then D; = i— S;, i= 1,2,...,n. Under Ho, X and Y being independent, the n! pairs (i, S;) 
of ranks are equally likely. It follows that 


Py,{R =r} = (n!)~' x (number of pairs for which R = r) (25) 
Ny 


= al’ say. 


Note that —1 < R < 1, and the extreme values can occur only when either the rankings 
match, that is, R; = S;, in which case R = 1, or R; =n+1-—S;, in which case R = —1. 
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Moreover, one need compute only one half of the distribution, since it is symmetric about 0 
(Problem 7). 

In the following example we will compute the distribution of R for n = 3 and 4. The 
exact complete distribution of S>y_, D?, and hence R, for n < 10 has been tabulated by 
Kendall [51]. Table ST13 gives the values of R,, for some selected values of n and a. 


Example 4. Let us first enumerate the null distribution of R for n = 3. This is done in the 
following table: 


a= 12>~Vis; 3(n+1) 
(81, 52,53) Soisi r= n(n2 — 1) a 


(1,2,3) 14 1.0 

(1,3,2) 13 0.5 

(2,1,3) 13 0.5 

Thus 

z, r=1.0, 
2, r=. 02; 
a | eee 
, r=—1.0 


Similarly, for n = 4 we have the following: 


(51,82, 53,84) S visi rn, Py {R=r} 
1 
(1,2,3,4) 30 «1 1 - 
(1,3,2,4),(2,1,3,4 
et) 29 08 3 - 
(1,2,4,3) 
(2,1,4,3) 28 06 «1 = 
(1,3,4,2), (1,4, 2,3), (2,3, 1,4 
) 1243), ) 277 «~—04 «4 = 
(3, 1,2,4) 
(1,4,3,2), (3,2, 1,4) % 02. 2 3 
25 00 2 _ 


The last value is obtained from symmetry. 


Example 5. In Example 3, we see that 


_ 128238. 3x5 _ 
~ 4x15 300 
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Since Py, {|R| > 0.4} = 18/24 = 0.75, we cannot reject Hp at a = 0.05 or a = 0.10. 


For large samples it is possible to use a normal approximation. It can be shown (see, 
e.g., Fraser [32, pp. 247—248]) that under Hp the RV 


Li (2yRs ») n>/2 
i=1 
or, equivalently, 
Z=RvVn-1 


has approximately a standard normal distribution. The approximation is good for n > 10. 


PROBLEMS 13.5 


1. A sample of 240 men was classified according to characteristics A and B. Char- 
acteristic A was subdivided into four classes A,, Az, A3, and Ay, while B was 
subdivided into three classes By, Bz, and B3, with the following result: 


Ay, As. Ba Aa 
B,|12 25 32 11] 80 
By |17 IS 22 23'| 80 
B,|21 17 16 26| 80 
50 60 70 60 | 240 


Is there evidence to support the theory that A and B are independent? 


2. The following data represent the blood types and ethnic groups of a sample of Iraqi 
citizens: 


Blood Type 
Ethnic Group O A B AB 
Kurd 531 450 293 226 
Arab 174 150 133 36 
Jew 42 26 26 8 
Turkoman 47 49 22 10 
Ossetian 50 = 59 26 15 


Is there evidence to conclude that blood type is independent of ethnic group? 


3. Ina public opinion poll, a random sample of 500 American adults across the coun- 
try was asked the following question: “Do you believe that there was a concerted 
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effort to cover up the Watergate scandal? Answer yes, no, or no opinion.” The 
responses according to political beliefs were as follows: 


Political he a 
Affiliation Yes No No Opinion 
Republican 45 75 30 150 
Independent 85 45 20 150 
Democrat 140 =30 30 200 
270 =150 80 500 


Test the hypothesis that attitude toward the Watergate cover-up is independent of 
political party affiliation. 

. Arandom sample of 100 families in Bowling Green, Ohio, showed the following 
distribution of home ownership by family income: 


Annual Income (dollars) 
Residential Less than 30,000— 50,000 


Status 30,000 50,000 or Above 
Home Owner 10 15 30 
Renter 8 17 20 


Is home ownership in Bowling Green independent of family income? 

. Ina flower show the judges agreed that five exhibits were outstanding, and these 
were numbered arbitrarily from 1 to 5. Three judges each arranged these five 
exhibits in order of merit, giving the following rankings: 


JudgeA: 5 3 1 2 4 
JudggeB: 3 1 5 4 2 
JudgeC: 5 2 3 1 4 


Compute the average values of Spearman’s rank correlation coefficient R and 
Kendall’s sample tau coefficient T from the three possible pairs of rankings. 

. For the bivariate normally distributed RV (X, Y) show that 7 = Oif and only if X and 
Y are independent. [Hint: Show that t = (2/7) sin~' p, where p is the correlation 
coefficient between X and Y.] 

. Show that the distribution of Spearman’s rank correlation coefficient R is symmet- 
ric about 0 under Hp. 

. In Problem 5 test the null hypothesis that rankings of judge A and judge C are 
independent. Use both Kendall’s tau and Spearman’s rank correlation tests. 
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9. A random sample of 12 couples showed the following distribution of heights: 


Height (in.) Height (in.) 
Couple Husband Wife Couple Husband Wife 
1 80 72 7 74 68 
2 70 60 8 71 71 
3 73 76 9 63 61 
4 72 62 10 64 65 
J 62 63 11 68 66 
6 65 46 12 67 67 


(a) Compute T. 

(b) Compute R. 

(c) Test the hypothesis that the heights of husband and wife are independent, using 
T as well as R. In each case use the normal approximation. 


13.6 SOME APPLICATIONS OF ORDER STATISTICS 


In this section we consider some applications of order statistics. We are mainly inter- 
ested in three applications, namely, tolerance intervals for distributions, coverages, and 
confidence interval estimates for quantiles and location parameters. 


Definition 1. Let F be a continuous DF. A tolerance interval for F with tolerance coeffi- 
cient yy is a random interval such that the probability is + that this random interval covers 
at least a specific percentage (100p) of the distribution. 


Let X;,X2,...,X, be a sample of size n from F, and let X(1),X(2),---,X(n) be the cor- 
responding set of order statistics. If the end points of the tolerance interval are two-order 
statistics X(,),X(s), 7 << s, we have 

PLP{X() <X < Xs} > PH =V. (1) 


Since F is continuous, F(X) is U(0,1), and we have 


= F(X(s)) — F(X) 
= Us) — Uy), (2) 


where U(,), Us) are the order statistics from U (0, 1). Thus (1) reduces to 


P{Us) — Uy) = p} =7- (3) 
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The statistic V = Us) — Vey, 1<r<s <n, is called the coverage of the interval 
(X(7),Xs)). More precisely, the differences V; = F(X x)) — F(X (e-1)) = Uy — Ue-1), for 
k=1,2,...,.2+1, where Uo) = —o0 and U(,41) = 1, are called elementary coverages. 

Since the joint PDF of U(1), U2), ---, Un) is given by 


ni, O<uy <n <+++ <I, 
0, otherwise, 


f (U1, U2,-+-,Un) = 


the joint PDF of V;, V2,...,V, is easily seen to be 


n, WSO, i= 1,2,.0.:n, ws 1 
0, otherwise. 


h(v1,V2,---5Vn) -{ (4) 


Note that 4 is symmetric in its arguments. Consequently, V;’s are exchangeable RVs and 
the distribution of every sum of r, r <n, of these coverages is the same and, in particular, 
it is the distribution of U(,, = 3j_, Vj, namely, 


n—1\_r—1 n—r 

n Uu l—u , O<u<l 

g,(u) = (a) ( ) } (5) 
0, otherwise. 


The common distribution of elementary coverages is 
gi(u) =n(1—u)""', 0<u<1, =0, otherwise. 


Thus EV; = 1/(n+1) and $*\_, EV; = r/(n+1). This may be interpreted as follows: 
The order statistics X(1),X(2),---,X,(n) partition the area under the PDF inn + 1 parts such 
that each part has the same average (expected) area. 

The sum of any r successive elementary coverages Vj.1,Vi1,...,Vi+, is called an 
r-coverage. Clearly 


So Visi = Uitn — Ue, it+r<n, (6) 
j=l 
and, in particular, U(;) — U(,) = a 1 Vj. Since V’s are exchangeable it follows that 
d 
U(s) — Uy = Us) M) 
with PDF 
n—1 


jee —uyst", O<uK<l. 


1 s—r—1 
= | Bs—r(u)du= S> ("apy (8) 
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where the last equality follows from (5.3.48). Given n, p, 7 it may not always be possible 
to find s — r to satisfy (8). 


Example 1. Let s =n andr = 1. Then 


n—2 


n i n—-i n n— 
v= (Foe) = 1—p"—np"""(1—p). 


i=0 


If p = 0.8,n =5, r= 1, then 
7 = 1—(0.8)° — 5(0.8)*(0.2) = 0.263. 


Thus the interval (X (1) ,X(s)) in this case defines a 26 percent tolerance interval for 0.80 
probability under the distribution (of X). 


Example 2. Let X,,X2,X3,X4,Xs5 be a sample from a continuous DF F. Let us find r and s, 
r<s, such that (X(,),X(s)) is a 90 percent tolerance interval for 0.50 probability under F. 


We have 
s—r—-1 5 
1 5 1 
A = >= = ) — ; 


It follows that, if we choose s—r = 4, then 7 = 0.81; and if we choose s—r =5, then 
y = 0.969. In this case, we must settle for an interval with tolerance coefficient 0.969, 
exceeding the desired value 0.90. 


In general, given p, 0 < p < 1, it is possible to choose a sufficiently large sample of 
size n and a corresponding value of s — r such that with probability > an interval of the 
form (X(,),X(s)) covers at least 100p percent of the distribution. If s—r is specified as a 
function of n, one chooses the smallest sample size n. 

Example 3. Let p= 3 and 7 = 0.75. Suppose that we want to choose the smallest sample 
size required such that (X(2) ,X(ny) covers at least 75 percent of the distribution. Thus we 
want the smallest n to satisfy 


os S()( 


From Table ST! of binomial distributions we see that n = 14. 
We next consider the use of order statistics in constructing confidence intervals for 


population quantiles. Let X be an RV with a continuous DF F, 0 < p < 1. Then the quantile 
of order p satisfies 


F(3p) =p. (9) 
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Let X|,X2,...,X, be n independent observations on X. Then the number of X;’s < 3, is an 
RV that has a binomial distribution with parameters n and p. Similarly, the number of X;’s 
that are at least 3, has a binomial distribution with parameters n and | — p. 

Let X(1),X(2),---;X(n) be the set of order statistics for the sample. Then 


P{X(,) < 3p} = P{At least r of the X;’s < 3p} 
STAY 4 4 
= 3 ("Joa _ py (10) 
Similarly 


P{X(s) = 3p} = P{At least n—s + 1 of the X;’s > 3} 
= P{At most s— 1 of the X;’s < 3,} 


s—l1 
= ("era — pyr, (ul) 


It follows from (10) and (11) that 


P{X (7) S 3p SX) } = P{X(s) 2 3p} — P{X(y) > 3p 
= P{X(r) S 3p} —1 + P{X(s) 2 3p} 


s—1 
= ei py. (12) 


It is easy to determine a confidence interval for 3, from (12), once the confidence level is 
given. In practice, one determines r and s such that s — r is as small as possible, subject to 
the condition that the level is 1 — a. 


Example 4. Suppose that we want a confidence interval for the median (p = 5), based on 
a sample of size 7 with confidence level 0.90. It suffices to find r and s, r < s, such that 


s (") (4) S650: 


By trial and error, using the probability distribution b(7, 5) we see that we can choose 
s=7,r=2o0rr=1,s=6; in either case s—r is minimum (= 5), and the confidence level 
is at least 0.92. 


Example 5. Let us compute the number of observations required for (X, (1) (n)) to bea 
0.95 level confidence interval for the median, that is, we want to find n such that 


P{X 1) S 31/2 < Xm} = 0.95. 
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It suffices to find n such that 


It follows from Table ST1 that n = 6. 


Finally we consider applications of order statistics to constructing confidence intervals 
for a location parameter. For this purpose we will use the method of test inversion discussed 
in Chapter 11. We first consider confidence estimation based on the sign test of location. 

Let X,X2,...,X, be a random sample from a symmetric, continuous DF F(x — 6) and 
suppose we wish to find a confidence interval for 0. Let R* (X — 09) = # of X;’s > 00, be 
the sign-test statistic for testing Hp : 0 = 0 against H, : 0 # Oo. Clearly, Rt (X — 09) ~ 
b(n, 1/2) under Ho. The sign-test rejects Ho if 


min{Rt (X—69), R*(@)—X)} <ec (13) 
for some integer c to be determined from the level of the test. Let r = c+ 1. Then any 
value of # is acceptable provided it is greater than the rth smallest observation and smaller 
than the rth largest observation, giving as confidence interval 


Xin < O< X(nt1—r)- (14) 


If we want level 1 — a to be associated with (14), we choose c so that the level of test 
(13) is a. 


Example 6. The following 12 observations come from a symmetric, continuous DF 
F(x— 8): 


223, —380, —94, —179, 194,25, —177, —274, —496, —507, —20, 122. 


We wish to obtain a 95% confidence interval for 6. Sign test rejects Ho if RT (X) >9or <2 
at level 0.05. Thus 


P{3 <Rt(X—6) < 10} = 1—2(0.0193) = 0.9614 > 0.95. 
It follows that a 95% confidence interval for 0 is given by (X(3),X (10) ) or (—380, 25). 


We next consider the Wilcoxon signed-ranks test of Ho : 6 = 6 to construct a con- 
fidence interval for @. The test statistic in this case is T* = sum of ranks of positive 
(X; — 00)’s in the ordered |X; — 0o|’s. From (13.3.4) 


[= ~, Tix, +x, >200] 
1<i<j<n 

X; +X; 

= number of — u 


> A. 
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Let Tj; = (X;+Xj)/2, 1 <i<j <n and order the N = en T;;’s in increasing order of 
magnitude 


Tay <T a) <-++<Tw). 


Then using the argument that converts (13) to (14) we see that a confidence interval for 6 
is given by 


Tir) < 0< Tiw+1-r)- (15) 
Critical values c are taken from Table ST10. 


Example 7. For the data in Example 6, the Wilcoxon signed-rank test rejects Ho : 6 = 00 
at level 0.05 if T* > 64 or TT < 14. Thus 


P{14< Tt (XK —6) < 64} > 0.95. 


It follows that a 95% confidence interval for @ is given by [T(14), T(64)] = [-336.5, —20]. 


PROBLEMS 13.6 


1. Find the smallest values of m such that the intervals (a) (X, (1) X(n)) and 
(b) (X(2),X(n—1)) contain the median with probability > 0.90. 


2. Find the smallest sample size required such that (X(1) ,X(n) ) covers at least 90 percent 
of the distribution with probability > 0.98. 


3. Find the relation between n and p such that (X (1) x (n)) covers at least 100 p percent 
of the distribution with probability > 1 — p. 


4. Given ¥, 6, po, pi with p; > po, find the smallest n such that 


PLF(X,,)) — F(X) > po} ary 


and 


PLF(X (3) =F Xs) > pi} <6. 


Find also s—r. 
[Hint: Use the normal approximation to the binomial distribution. ] 


5. In Problem 4 find the smallest 1 and the associated value of s—r if y = 0.95, 6 = 
0.10, p1 = 0.75, po = 0.50. 


6. Let X;,X2,...,X7 be arandom sample from a continuous DF F’. Compute: 
(a) P(X) <3.5 <X(7). 
(b) P(X, nea) 
(c) P(X) <3.8 < X(6)). 

7. Let X,,Xo,...,X, be iid with common continuous DF F. 
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(a) What is the distribution of 
F((Xn-1) — F(Xq@) + F(X) — F(X) 


for2<i<j<n-—1? 
(b) What is the distribution of [F(X(n)) — F(X(2))|/[F(X ny) — F(X(1)))- 


13.7 ROBUSTNESS 


Most of the statistical inference problems treated in this book are parametric in nature. We 
have assumed that the functional form of the distribution being sampled is known except 
for a finite number of parameters. It is to be expected that any estimator or test of hypothe- 
sis concerning the unknown parameter constructed on this assumption will perform better 
than the corresponding nonparametric procedure, provided that the underlying assump- 
tions are satisfied. It is therefore of interest to know how well the parametric optimal tests 
or estimators constructed for one population perform when the basic assumptions are mod- 
ified. If we can construct tests or estimators that perform well for a variety of distributions, 
for example, there would be little point in using the corresponding nonparametric method 
unless the assumptions are seriously violated. 

In practice, one makes many assumptions in parametric inference, and any one or all 
of these may be violated. Thus one seldom has accurate knowledge about the true under- 
lying distribution. Similarly, the assumption of mutual independence or even identical 
distribution may not hold. Any test or estimator that performs well under modifications of 
underlying assumptions is usually referred to as robust. 

In this section we will first consider the effect that slight variation in model assump- 
tions have on some common parametric estimators and tests of hypotheses. Next we will 
consider some corresponding nonparametric competitors and show that they are quite 
robust. 


13.7.1 Effect of Deviations from Model Assumptions on Some Parametric Proce- 
dures 


Let us first consider the effect of contamination on sample mean as an estimator of the 
population mean. 

The most commonly used estimator of the population mean ju is the sample mean X. 
It has the property of unbiasedness for all populations with finite mean. For many parent 
populations (normal, Poisson, Bernoulli, gamma, etc.) it is a complete sufficient statistic 
and hence a UMVUE. Moreover, it is consistent and has asymptotic normal distribution 
whenever the conditions of the central limit theorem are satisfied. Nevertheless, the sam- 
ple mean is affected by extreme observations, and a single observation that is either too 
large or too small may make X worthless as an estimator of jz. Suppose, for example, that 
X,,Xo,...,X, 18 a sample from some normal population. Occasionally something happens 
to the system, and a wild observation is obtained that is, suppose one is sampling from 
N(u,07), say, 100a percent of the time and from N(,ko”), where k > 1, (1 — a)100 
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percent of the time. Here both jz and o? are unknown, and one wishes to estimate ju. In 
this case one is really sampling from the density function 


F(x) = afo(x) + (1 — afi (x), (1) 
where fy is the PDF of N(ju,07), and f;, the PDF of N(y,ko7). Clearly, 


= 1X; 
x= oo (2) 
n 
is still unbiased for yu. If a is nearly 1, there is no problem since the underlying distribution 
is nearly N(j1,07), and X is nearly the UMVUE of pu with variance o”/n. If 1 — a is large 
(that is, not nearly 0), then, since one is sampling from /, the variance of X, is 0? with 
probability a and is ko? with probability 1 — a, and we have 


= 1 vm 
var, (X) = 7 var(X1) = = let (1—a)k]. (3) 


If k(1 — q) is large, var,(X) is large and we see that even an occasional wild observa- 
tion makes X subject to a sizable error. The presence of an occasional observation from 
N(,ko7) is frequently referred to as contamination. The problem is that we do not know, 
in practice, the distribution of the wild observations and hence we do not know the PDF f. 
It is known that the sample median is a much better estimator than the mean in the pres- 
ence of extreme values. In the contamination model discussed above, if we use Z;/2, the 
sample median of the X;’s, as an estimator of js (which is the population median), then for 
large n 


1 1 
E(Z,/2 — pw)? = var(Z 2) ¥ — 4 
( 1/2 9) var ( 1/2) 4n Fo)? ( ) 
(See Theorem 7.5.2 and Remark 7.5.7.) Since 
F(H) = ofo(H) + 1 — afi () 
a 1 l-a 1 
= + ( =(a+—* 
oVv2n ov 2rk Vk oVv2r 
we have 
To i 
var(Z/2) © (5) 


2n {at [(1—a)/Vk]} 


As k - 00, var(Z,/2) © 10? /(2na”). If there is no contamination, a = | and var(Zj >) © 
mo” /2n. Also, 


mo? /2no? 1 


mo?/2n a2’ 
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which will be close to 1 if @ is close to 1. Thus the estimator Z, 2 will not be greatly 
affected by how large k is, that is, how wild the observations are. We have 


var(X) 2 ” ie 
var(Z1/2) 7 7 ~ M4 | . 


(1a) 
Vk 


Indeed, var(X) — 00 as k + 00, whereas var(Z,/2) + ™o7/(2na?) as k —+ oo. One can 
check that, when k = 9 and a = 0.915, the two variances are (approximately) equal. As 
k becomes larger than 9 or a smaller than 0.915, Z;/2 becomes a better estimator of ju 
than X. 

There are other flaws as well. Suppose, for example, that X),Xo,...,X, is a sam- 
ple from U(0,0), 9 > 0. Then both X and T(X) = (Xi) + X(n))/2, where Xi) = 
min(X,,...,X,), X(n) = max(X),...,X,), are unbiased for EX = 6/2. Also, varg(X) = 
var(X)/n = 67/[12n], and one can show that var(T) = 67/[2(n + 1)(n + 2)]. It follows 
that the efficiency of X relative to that of T is 


2 
| > co ask —- oo. 


= varg(T) 6n : 
ffg(X |T) = — = 1 f 2. 
ei varg(X)  (n+1)(n+2) . al 


In fact, effg(X | T) + 0 as n + ov, so that in sampling from a uniform parent X is much 
worse than T, even for moderately large values of n. 

Let us next turn our attention to the estimation of standard deviation. Let X,,X2,...,Xn 
be a sample from N(j,07). Then the MLE of a is 


n — 1/2 1/2 
xX; —X)* -1 
a= {eee (4 ) s. (6) 


Note that the lower bound for the variance of any unbiased estimator for o is o7/2n. 
Although G is not unbiased, the estimator 


—_ aTi(n—1)/2] n—1T[(n—1)/2] 


2 P(n/2) 2 P(n/2) 


(7) 


is unbiased for co. Also, 
yg fn-1 (T[(n—1)/2]\ 
wns) =| 5 ( T(n/2) ) 7 
2 1 
=F +0(5). (8) 


Thus the efficiency of S; (relative to the estimator with least variance = a /2n) is 


a? /2n 1 
= mo 
var(Si) 1+070(2) 
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and — | asm — oo. For small n, the efficiency of S$; is considerably smaller than 1. Thus, 
for n = 2, eff(S,) = 1/[2(a —2)] = 0.438 and, for n = 3, eff(S,) = 7/[6(4—7)] = 0.61. 
Yet another estimator of o is the sample mean deviation 


T< = 
So = - X;—X|. 9 
2 93 | (9) 
Note that 
Tl wT 
E = X,— =,/-E|X;—-—p| =o, 
{523 a} (FE nl=0, 
and 


r1< T-2 5 
wo 54D a} raga (10) 


If n is large enough so that X ~ ju, we see that S; = ,/(7/2)S> is nearly unbiased for o 
with variance [(7 — 2)/2n]o7. The efficiency of 53 is 


o?(2n) a! Zi 
o7|(x—2)/(2n)| w—2 , 
For large n, the efficiency of S; relative to $3 is 
var(S3) _ [(m—2)/(2n)]o? Soe ie w—2 24 
var(S,;) 0? /(2n) + O(1/n2) ' O(2/n) ~ 


Now suppose that there is some contamination. As before, let us suppose that for a 
proportion a of the time we sample from NV(,07) and for a proportion 1 — a of the time we 
get a wild observation from N(j1,ko7), k > 1. Assuming that both jz and o? are unknown, 
suppose that we wish to estimate c. In the notation used above, let 


f(x) = afo(x) + (1 a)fi(x), 


where fg is the PDF of N(,07), and f;, the PDF of N(u,ko7). Let us see how even small 
contamination can make the maximum likelihood estimate o of o quite useless. 

If 6 is the MLE of @, and y is a function of 0, then y() is the MLE of (0). In view 
of (7.5.7) we get 


E(é-—0) & | ee? oy. (11) 


= 2 
E(@—0 x MO (12) 
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(dropping the other two terms with n? and n° in the denominator), so that 


E(6— 0)” & 7 (Ma — 13). (13) 
For the density f, we see that 
tig = 30" [a +k (1—a)] (14) 
and 
py = 07 [a+k(1—a)}. (15) 
It follows that 
E{é—c} x ia {3[a+k(1—a)]—[a+k(1—a)}?}. (16) 


If we are interested in the effect of very small contamination, a ~ | and 1—a = 0. 
Assuming that k(1 — a) ~ 0, we see that 


2 
E{é—of = 7 {3[1+ P(1—a)]—1} 
4n 
a 372 
= a ilak (1—a)]. (17) 
In the normal case, j14 = 30% and 3 = 0%, so that from (11) 


& oa, 2 2 
E{g-o}"® a 
Thus we see that the mean square error due to a small contamination is now multiplied by 
a factor [1 + 3k?(1—a)]. If, for example, k = 10, a = 0.99, then 1 + 3k?(1—a) = 3. If 
k = 10, w= 0.98, then 1 + 3k?(1—a) = 4, and so on. 

A quick comparison with $3 shows that, although 5S (or even G) is a better estimator of 
o than $3 if there is no contamination, 53 becomes a much better estimator in the presence 
of contamination as k becomes large. 

Next we consider the effect of deviation from model assumptions on tests of hypothe- 
ses. One of the most commonly used tests in statistics is Student’s t-test for testing the 
mean of a normal population when the variance is unknown. Let X1,X2,...,X;, be a sam- 
ple from some population with mean ju and finite variance a”. As usual, let X denote the 
sample mean, and S*, the sample variance. If the population being sampled is normal, the 
t-test rejects Ho: js = po against Hy: ps A uo at level a if |X — puo| > th—1,0/2(s/V/n). If 
n is large, we replace t,_1,./2 by the corresponding critical value, z,./2, under the stan- 
dard normal law. If the sample does not come from a normal population, the statistic 
T = [(X — 10) /S],/n is no longer distributed as a t(n — 1) statistic. If, however, n is suf- 
ficiently large, we know that T has an asymptotic normal distribution irrespective of the 
population being sampled, as long as it has a finite variance. Thus, for large n, the distri- 
bution of T is independent of the form of the population, and the f-test is stable. The 
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same considerations apply to testing the difference between two means when the two 
variances are equal. Although we assumed that n is sufficiently large for Slutsky’s result 
(Theorem 7.2.15) to hold, empirical investigations have shown that the test based on Stu- 
dent’s statistic is robust. Thus a significant value of t may not be interpreted to mean a 
departure from normality of the observations. Let us next consider the effect of depar- 
ture from independence on the f-distribution. Suppose that the observations X,,X2,...,Xn 
have a multivariate normal distribution with EX; = ju, var(X;) = 07, and p as the common 
correlation coefficient between any X; and X;, i ¢ j. Then 


2 
EX = 1 and var(X) = * 1+ (n—1)pl, (18) 
n 
and since X;’s are exchangeable it follows from Remark 6.3.1 that 


ES? =a*(1—>). (19) 


For large n, the statistic \/n(X — f9)/S will be asymptotically distributed as N(0, 1 + 
np/(1—p)), instead of N(0, 1). Under Ho, p = 0 and T* = n(X — po)” /S? is distributed as 
F(1,n— 1). Consider the ratio 


nE(X— po)” _ o°[1+(n—Np] _, 
EE? ——— @(1-p) 


np 
l-p 


(20) 


The ratio equals | if p = 0 but is > 0 for p > 0 and + oo as p — |. It follows that a large 
value of T is likely to occur when p > 0 and is large, even though jug is the true value of 
the mean. Thus a significant value of t may be due to departure from independence, and 
the effect can be serious. 

Next, consider a test of the null hypothesis Hy: o = oo against H,: o 4 oo. Under the 
usual normality assumptions on the observations X1,X2,...,Xy, the test statistic used is 


(n=1)S* _ Yas (Ki —X)? 


y= a (21) 
which has a x?(n — 1) distribution under Ho. The usual test is to reject Ho if 
(n—1)S? > ; 
a > Xn—-1,a/2 or Vo~< Xp ia (22) 
0 


Let us suppose that X),X2,...,X, are not normal. It follows from Corollary 2 of Theo- 
rem 7.3.4 that 


var(S*) = = + n(n — 1) 42" (23) 


so that 


2 — 
var(5) = +454 = (24) 
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Writing 72 = (144/0*) — 3, we have 


2 
var (5 )-2+ z (25) 


2 
var (=) = : (26) 


when the X;’s are normal (72 = 0). Now (n— 1)S? = )77_, (X; —X)? is the sum of n identi- 
cally distributed but dependent RVs (X; —X)?,j=1,2,...,n. Using a version of the central 
limit theorem for dependent RVs (see, e.g., Cramér [17, p. 365]), it follows that 


sy" S? 
| 
(S) (2-1), 


under Hp, is asymptotically N(0, 1 + (72/2)), and not V(0, 1) as under the normal theory. 
As a result the size of the test based on the statistic Vo will be different from the stated 
level of significance if y2 differs greatly from 0. It is clear that the effect of violation 
of the normality assumption can be quite serious on inferences about variances, and the 
chi-square test is not robust. 

In the above discussion we have used somewhat crude calculations to investigate the 
behavior of the most commonly used estimators and test statistics when one or more of 
the underlying assumptions are violated. Our purpose here was to indicate that some tests 
or estimators are robust whereas others are not. The moral is clear: One should check 
carefully to see that the underlying assumptions are satisfied before using parametric 
procedures. 


13.7.2. Some Robust Procedures 


Let X,,X2,...,X, be arandom sample from a continuous PDF f(x— @), 0 € ® and assume 
that f is symmetric about @. We shall be interested in estimation or tests of hypotheses 
concerning 9. Our objective is to find procedures that perform well for several different 
types of distributions but do not have to be optimal for any particular distribution. We will 
call such procedures robust. We first consider estimation of 6. 

The estimators fall under one of the following three types: 


1. Estimators that are functions of R = (R),R2,...,R,), where R; is the rank of Xj, are 
known as R-estimators. Hodges and Lehmann [44] devised a method of deriving 
such estimators from rank tests. These include the sample median X (based on the 
sign test) and W = med{(X;+X;)/2, 1 <i<j <n} based on the Wilcoxon signed- 
rank test. 

2. Estimators of the form ae 1 ajX (i) are called L-estimators, being linear combina- 
tions of order statistics. This class includes the median, the mean, and the trimmed 
mean obtained by dropping a prespecified proportion of extreme observations. 
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3. Maximum likelihood type estimators obtained as solutions to certain equations 
jai V(X} — 9) = 0 are called M-estimators. The function y(t) = —f"(t)/f(t) gives 
MLEs. 


Definition 1. Let k = [na] be the largest integer < na where 0 < a < 1/2. Then the 
estimator 


i= xX of (27) 


is called a trimmed-mean. 

Two extreme examples of trimmed means are the sample mean X(a = 0) and the 
median X when all except the central (7 odd) or the two central (m even) observations 
are excluded. 


Example I. Consider the following sample of size 15 taken from a symmetric 
distribution. 


0.97 0.66 0.73 0.78 1.30 0.58 0.79 0.94 
0.52 0.52 0.83 1.25 1.47 0.96 0.71 


Suppose a = 0.10. Then k = [na] = 1 and 


7 Yo xW 
X0.10 = 75.2 > 0.85. 


Here x = 0.867, med_xj = X(g) = 0.79. 
1<j<15 


We will limit this discussion to four estimators of location, namely, the sample median, 
trimmed mean, sample mean, and Hodges—Lehmann type estimator based on Wilcoxon 
signed-rank test. In order to compare the performance of two procedures A and B we will 
use a (large sample) measure of relative efficiency due to Pitman. Pitman’s asymptotic 
relative efficiency (ARE) of procedure B relative to procedure A is the limit of the ratio 
of sample sizes n4/ng, where na, ng are sample sizes needed for procedures A and B to 
perform equivalently with respect to a specified criterion. For example, suppose {T,,4) } 
and {T,,g)} are two sequences of estimators for ¢)(@) such that 


Tia) ~ AN (v) a ; 


and 


Tp) ~ AN (“@) “a 


Suppose further that A and B perform equivalently if their asymptotic variances are the 
same, that is, 
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Then 


Clearly, different performance measures may lead to different measures of ARE. 

Similarly if procedures A and B lead to two sequences of tests, then ARE is the limiting 
ratio of the sample sizes needed by the tests to reach a certain power (J against the same 
alternative and at the same limiting level a. 

Accordingly, let e(B,A) denote the ARE of B relative to A. If e(B,A) = 1/2 say, 
then procedure A requires (approximately) half as many observations as procedure B. 
We will write er(B,A), whenever necessary to indicate the dependence of ARE on the 
underlying DF F. 

For detailed discussion of Pitman efficiency we refer to Lehmann [61, pp. 371-380], 
Lehmann [63, section 5.2], Serfling [102, chapter 10], Randles and Wolfe [85, chapter 5], 
and Zacks [121]. The expressions for AREs of median and the Hodges-Lehmann estima- 
tors of location parameter 6 with respect to the sample mean X are 


er(X,X) = 4o7f(0), (28) 
lo) 2 
er(W,X) = 1207, | / Peas ; (29) 


where f is the PDF corresponding to F. In order to get er (x ,W) we use the fact that 


=“ (30) 


Bickel [5] showed that 
er(Xq,X) = (31) 


where 


2 bl-a 
= (Sar ff Pf (t)dt +0310 (32) 


and 3, is the unique ath percentile of F. It is clear from (32) that no closed form expression 
for er(Xq,X) is possible for most DFs F. 
In the following table we give the AREs for some selected F’. 
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ARE Computations for Selected F 


F e(X,X) e(W,X) — e(X,W) 
U(-1/2,1/2) 1/3 1 1/3 
N(0, 1) 2/7 =0.637 = 3/7 =0.955 2/3 
Logistic, f(x) = e~* (1+ ey! a /12 = 0.822 1.10 0.748 
Double Exponential, 
f(x) = (1/2) exp(—|x]) 2 1.5 4/3 
€(0, 1) ore) ore) 4/3 


It can be shown that e-(X,X) > 1/3 for all symmetric F, so X is quite inefficient 
compared to X for U(—1/2,1/2). Even for normal f, X would require 157 observations 
to achieve the same accuracy that X achieves with 100 observations. For heavier tailed 
distributions, however, X provides more protection that x. 

The values of e(W,X), on the other hand, are quite high for most F and, in fact, 
er(W,X) > 0.864 for all symmetric F. Even for normal F one loses little (4.5%) in using 
W instead of X. Thus W is more robust as an estimator of 0. 

A look at the values of e(X ,W) shows that X is worse than W for distributions with 
light-tails but does slightly better than W for heavier-tailed F. 

Let us now compare the AREs of X,, X, and W. The following AREs for selected a 
are due to Bickel [5]. 


ARE Comparisons 


a=0.01 a =0.05 
F e(Xa,X) e(W,Xa) e(XasX) €(W,Xa) 
Uniform 0.96 1.04 0.83 1.20 
Normal 0.995 0.96 0.97 0.985 
Double Exponential 1.06 1.41 1.21 1.24 
Cauchy oo 6.72 oo 2.67 


We note that X,, performs quite well compared to X. In fact, for normal distribution the 
efficiency is quiet close to 1 so there is little loss in using X,.. For heavier-tailed distribu- 
tions Xq is preferable. For small values of a, it should be noted that Xq does not differ 
much from X. Nevertheless, X, is more robust; it cannot do much worse than X but can 
do much better. Compared to Hodges—Lehmann estimator, X. does not perform as well. 
It (W) provides better protection against outliers (heavy tails) and gives up little in the 
normal case. 

Finally we consider testing Ho : 6 = 09 against H, : 0 > @. Recall that X,,X2,...,Xn 
are iid with common continuous symmetric DF F(x — 6), 6 © ® and PDF f(x— 6). 
Suppose o7 = Var(Xi) < oo. Let S denotes the sign test based on the statis- 
tic R+(X) = 0", Iiy,50,). W denotes the Wilcoxon signed-rank test based on the 
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statistic T*(X) = i<i<j<nl[xi+x)>26)], M denotes the test based on the Z-statistic Z = 
/n((X — 00) /op, and t denotes the student’s t-test based on the statistic \/n(X — 00) /S, 
where S? is the sample variance. 

First note that e(T,M) = 1. Next we note that e(S,t) = er(X,X), er(W,t) =er(W,X) 
so that AREs are the same as given in (28), (29), and (30) and values of ARE given in the 
table for various F remain the same for corresponding tests. 

Similar remarks apply as in the case of estimation of @. Sign test is not as efficient as the 
Wilcoxon signed-rank test. But for heavier-tailed distributions such as Cauchy and double 
exponential sign test does better than the Wilcoxon signed-rank test. 


PROBLEMS 13.7 


1. Let (X),X2,...,X,) be jointly normal with EX; = y, var(X;) = 07, and cov(X;,X;) = 
po? if |i—j| =1, i 4j, and = 0 otherwise. 
(a) Show that 


and 
E(S) =e" (1- 2) 


(b) Show that the t-statistic \/n(X — 1) /S is asymptotically normally distributed with 
mean 0 and variance | + 2. Conclude that the significance of t is overestimated 
for positive values of p and underestimated for p < 0 in large samples. 


(c) For finite n, consider the statistic 


Compare the expected values of the numerator and the denominator of 7” and 
study the effect of p ¥ 0 to interpret significant t values (Scheffé [101, p. 338].) 


2. Let X),X2,...,X, be arandom sample from G(a, 3), a > 0, 8 > 0: 
(a) Show that 


pa = 30(a+2)/ 8%. 


woot) alziiler®), 


(c) Show that the large sample distribution of (n— 1)S*/o? is normal. 


(b) Show that 


(d) Compare the large-sample test of Hp: o = 0 based on the asymptotic normality 
of (n— 1)S?/o? with the large-sample test based on the same statistic when the 
observations are taken from a normal population. In particular, take a = 2. 
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3. Let X),Xo,...,Xm and Y;, Y2,...,¥, be two independent random samples from pop- 


ulations with means ju; and j12, and variances o7 and 04, respectively. Let X,Y be 
the two sample means, and S785 be the two sample variances. Write N = m+n, 
R=m/n, and 0 = 07/03. The usual normal theory test of Ho: 41 — [2 = do is the 
t-test based on the statistic 


7 _¥=Y¥=%0 
a Sp(1/m-+1/n)!/2’ 
where 
2 __ (m—1)S} +(n—1)S5 
ys — . 


m+n—2 


Under A, the statistic T has a t-distribution with N — 2 d.f., provided that Gy = oa: 


Show that the asymptotic distribution of T in the nonnormal case is 
N(0,(9+R)(1 + RO)~') for large m and n. Thus, if R = 1, T is asymptotically 
N(0, 1) as in the normal theory case assuming equal variances, even though the two 
samples come from nonnormal populations with unequal variances. Conclude that 
the test is robust in the case of large, equal sample sizes = (Scheffé [101, p. 339]). 


. Verify the computations in the table above using the expressions of ARE in (28), 


(29), and (30). 


. Suppose F is a G(a, 8) r.v. Show that 


a 3aT? (2a) 
e(W, X)= 24(2-1) (2a — 1)2{T (a) 4” 


(Note that F is not symmetric.) 


. Suppose F has PDF 


7 r(m) 
PO)= ayy — Daas’ 


wo<x< am, 


for m > 1. compute e(X,X), e(W,X), and e(X, W). (From Problem 3.2.3, E|X|* < 00 
if k <<m—1/2.) 


FREQUENTLY USED SYMBOLS 
AND ABBREVIATIONS 


> implies 

S implies and is implied by 

= converges to 

ee increasing, decreasing 

VY nonincreasing, nondecreasing 

T(x) gamma function 

lim, lim, lim limit superior, limit inferior, limit 

RR, Re real line, n-dimensional Euclidean space 
8, By, Borel o-field on &, Borel o-field on &,, 
Ih indicator function of set A 

e(x) =lifx>0,and=0ifx<0 

Lb EX, expected value 

My EX", n > 0 integral 

Be E|X|*,a >0 

pk E(X — EX)‘, k > 0 integral 

o = lo, Variance 

Fisk of first, second, third derivative of f 

~ distributed as 

ee asymptotically (or approximately) equal to 
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FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 


convergence in law 

convergence in probability 
convergence almost surely 
convergence in rth mean 

random variable 

distribution function 

probability density function 
probability mass function 
probability generating function 
moment generating function 
degrees of freedom 

best linear unbiased estimate 
maximum likelihood estimate 
minimum variance unbiased estimate 
uniformly most accurate 

uniformly minimum variance unbiased estimate 
uniformly most accurate unbiased 
most powerful 

uniformly most powerful 

general linear model 

infinitely often 

independent, identically distributed 
standard deviation 

standard error 

monotone likelihood ratio 

mean square error 

weak law of large numbers 

strong law of large numbers 
central limit theorem 

sequential probability ratio test 
Bernoulli with parameter p 
binomial with parameters n, p 
negative binomial with parameters r, p 
Poisson with parameter 

uniform on [a,b] 

gamma with parameters a, 3 

beta with parameters a, 6 
chi-square with d.f. 

Cauchy with parameters ju, 


FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 


Fin,n,ox 

AN (tn; 0) 
GLR 

MRE 

nx 

exp(X) 
LMP 

£(x) 

b(6,.) 

iid 


normal with mean pj, variance a 


Student’s ¢ with n df. 
F-distribution with (m,n) d.f. 
100(1 — a)th percentile of N(0, 1) 
100(1 — a)th percentile of x(n) 
100(1 — a)th percentile of t(n) 
100(1 — a)th percentile of F(m,n) 
asymptotically normal 
generalized likelihood ratio 
minimum risk equivariant 
logarithm (to base e) of x 
exponential 

locally most powerful 

law or distribution of RV X 

bias in estimator 6 

independent, identically distributed 
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Table ST1. 


r=0,1,2,...,n—1 


Cumulative Binomial Probabilities, S~ 


r 


x=0 


STATISTICAL TABLES 


("ra =pyr, 


P 
nor 0.01 0.05 0.10 0.20 0.25 0.30 0.333 0.40 0.50 
2 0 0.9801 0.9025 0.8100 0.6400 0.5625 0.4900 0.4444 0.3600 0.2500 
1 0.9999 0.9975 0.9900 0.9600 0.9375 0.9100 0.8888 0.8400 0.7500 
3. 0 0.9703 0.8574 0.7290 0.5120 0.4219 0.3430 0.2963 0.2160 0.1250 
1 0.9997 0.9928 0.9720 0.8960 0.8438 0.7840 0.7407 0.6480 0.5000 
2 1.0000 0.9999 0.9990 0.9920 0.9844 0.9730 0.9629 0.9360 0.8750 
4 0 0.9606 0.8145 0.6561 0.4096 0.3164 0.2401 0.1975 0.1296 0.0625 
1 0.9994 0.9860 0.9477 0.8192 0.7383 0.6517 0.5926 0.4742 0.3125 
2 1.0000 0.9995 0.9963 0.9728 0.9492 0.9163 0.8889 0.8198 0.6875 
3 1.0000 0.9999 0.9984 0.9961 0.9919 0.9877 0.9734 0.9375 
5 0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.1317 0.0778 0.0312 
1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5283 0.4609 0.3370 0.1874 
2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8370 0.7901 0.6826 0.4999 
3 0.9999 0.9995 0.9933 0.9844 0.9693 0.9547 0.9130 0.8124 
4 1.0000 1.0000 0.9997 0.9990 0.9977 0.9959 0.9898 0.9686 
6 O 0.9415 0.7351 0.5314 0.2621 0.1780 0.1176 0.0878 0.0467 0.0156 
1 0.9986 0.9672 0.8857 0.6553 0.5340 0.4201 0.3512 0.2333 0.1094 
2 1.0000 0.9977 0.9841 0.9011 0.8306 0.7442 0.6804 0.5443 0.3438 
3 0.9998 0.9987 0.9830 0.9624 0.9294 0.8999 0.8208 0.6563 
4+ 0.9999 0.9999 0.9984 0.9954 0.9889 0.9822 0.9590 0.8907 
5 1.0000 1.0000 0.9999 0.9998 0.9991 0.9987 0.9959 0.9845 
7 0 0.9321 0.6983 0.4783 0.2097 0.1335 0.0824 0.0585 0.0280 0.0078 
1 0.9980 0.9556 0.6554 0.5767 0.4450 0.3294 0.2633 0.1586 0.0625 
2 1.0000 0.9962 0.8503 0.8520 0.7565 0.6471 0.5706 0.4199 0.2266 
3 0.9998 0.9743 0.9667 0.9295 0.8740 0.8267 0.7102 0.5000 
+ 1.0000 0.9973 0.9953 0.9872 0.9712 0.9547 0.9037 0.7734 
5 0.9998 0.9996 0.9987 0.9962 0.9931 0.9812 0.9375 
6 1.0000 1.0000 0.9999 0.9998 0.9995 0.9984 0.9922 
8 O 0.9227 0.6634 0.4305 0.1678 0.1001 0.0576 0.0390 0.0168 0.0039 
1 0.9973 0.9427 0.8131 0.5033 0.3671 0.2553 0.1951 0.1064 0.0352 
2 0.9999 0.9942 0.9619 0.7969 0.6786 0.5518 0.4682 0.3154 0.1445 
3 1.0000 0.9996 0.9950 0.9437 0.8862 0.8059 0.7413 0.5941 0.3633 
4 1.0000 0.9996 0.9896 0.9727 0.9420 0.9120 0.8263 0.6367 
2 1.0000 0.9988 0.9958 0.9887 0.9803 0.9502 0.8555 
6 1.0000 0.9996 0.9987 0.9974 0.9915 0.9648 
7 1.0000 0.9999 0.9998 0.9993 0.9961 
9 O 0.9135 0.6302 0.3874 0.1342 0.0751 0.0404 0.0260 0.0101 0.0020 
1 0.9965 0.9287 0.7748 0.4362 0.3004 0.1960 0.1431 0.0706 0.0196 
2 0.9999 0.9916 0.9470 0.7382 0.6007 0.4628 0.3772 0.2318 0.0899 
3 1.0000 0.9993 0.9916 0.9144 0.8343 0.7296 0.6503 0.4826 0.2540 
4 0.9999 0.9990 0.9805 0.9511 0.9011 0.8551 0.7334 0.5001 
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n 


s 


0.01 


0.05 


0.10 


0.20 


0.30 


0.333 


0.40 


0.50 


10 


11 


12 


13 


— 
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ee 


0.9044 
0.9958 
1.0000 


0.8954 
0.9948 
0.9998 
1.0000 


0.8864 
0.9938 
0.9998 
1.0000 
1.0000 
1.0000 


0.8775 
0.9928 
0.9997 
1.0000 


1.0000 


0.5987 
0.9138 
0.9884 
0.9989 
0.9999 
1.0000 


0.5688 
0.8981 
0.9848 
0.9984 
0.9999 
1.0000 


0.5404 
0.8816 
0.9804 
0.9978 
0.9998 
1.0000 


0.5134 
0.8746 
0.9755 
0.9969 
0.9997 
1.0000 


0.9998 
0.9999 
1.0000 


0.3487 
0.7361 
0.9298 
0.9872 
0.9984 
0.9999 
1.0000 


0.3138 
0.6974 
0.9104 
0.9815 
0.9972 
0.9997 
1.0000 


0.2824 
0.6590 
0.8892 
0.9744 
0.9957 
0.9995 
1.0000 


0.2542 
0.6214 
0.8661 
0.9659 
0.9936 
0.9991 


0.9970 
0.9998 
1.0000 


0.1074 
0.3758 
0.6778 
0.8791 
0.9672 
0.9936 
0.9991 
0.9999 
1.0000 


0.0859 
0.3221 
0.6174 
0.8389 
0.9496 
0.9884 
0.9981 
0.9998 
1.0000 


0.0687 
0.2749 
0.5584 
0.7946 
0.9806 
0.9961 
0.9994 
0.9999 
1.0000 


0.0550 
0.2337 
0.5017 
0.7473 
0.9009 
0.9700 


0.9746 
0.9956 
0.9995 
0.9999 
0.0282 
0.1493 
0.3828 
0.6496 
0.8497 
0.9526 
0.9894 
0.9984 
0.9998 
1.0000 
0.0198 
0.1130 
0.3128 
0.5696 
0.7897 
0.9218 
0.9784 
0.9947 
0.9994 
0.9999 
1.0000 
0.0139 
0.0850 
0.2528 
0.4925 
0.7237 
0.8822 
0.9614 
0.9905 
0.9983 
0.9998 
1.0000 


0.0097 
0.0637 
0.2025 
0.4206 
0.6543 
0.8346 


0.9575 
0.9916 
0.9989 
0.9998 
0.0173 
0.1040 
0.2991 
0.5592 
0.7868 
0.9234 
0.9803 
0.9966 
0.9996 
0.9999 
0.0116 
0.0752 
0.2341 
0.4726 
0.7110 
0.8779 
0.9614 
0.9912 
0.9986 
0.9999 
1.0000 
0.0077 
0.0540 
0.1811 
0.3931 
0.6315 
0.8223 
0.9336 
0.9812 
0.9962 
0.9995 
0.9999 
1.0000 
0.0052 
0.0386 
0.1388 
0.3224 
0.5521 
0.7587 


0.9006 
0.9749 
0.9961 
0.9996 
0.0060 
0.0463 
0.1672 
0.3812 
0.6320 
0.8327 
0.9442 
0.9867 
0.9973 
0.9999 
0.0036 
0.0320 
0.1189 
0.2963 
0.5328 
0.7535 
0.9007 
0.9707 
0.9941 
0.9993 
1.0000 
0.0022 
0.0196 
0.0835 
0.2254 
0.4382 
0.6652 
0.8418 
0.9427 
0.9848 
0.9972 
0.9997 
1.0000 
0.0013 
0.0126 
0.0579 
0.1686 
0.3531 
0.5744 


0.7462 
0.9103 
0.9806 
0.9982 
0.0010 
0.0108 
0.0547 
0.1719 
0.3770 
0.6231 
0.8282 
0.9454 
0.9893 
0.9991 
0.0005 
0.0059 
0.0327 
0.1133 
0.2744 
0.5000 
0.7256 
0.8867 
0.9673 
0.9941 
0.9995 
0.0002 
0.0032 
0.0193 
0.0730 
0.1939 
0.3872 
0.6128 
0.8062 
0.9270 
0.9807 
0.9968 
0.9998 
0.0000 
0.0017 
0.0112 
0.0462 
0.1334 
0.2905 
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Table ST1. (Continued) 
P 
n r 0.01 0.05 0.10 0.20 0.25 0.30 0.333 0.40 0.50 
6 0.9999 0.9930 0.9757 0.9376 0.8965 0.7712 0.5000 
7 1.0000 0.9988 0.9944 0.9818 0.9654 0.9024 0.7095 
8 0.9998 0.9990 0.9960 0.9912 0.9679 0.8666 
9 1.0000 0.9999 0.9994 0.9984 0.9922 0.9539 
10 1.0000 0.9999 0.9998 0.9987 0.9888 
11 1.0000 1.0000 0.9999 0.9983 
12 1.0000 0.9999 
14 0 0.8687 0.4877 0.2288 0.0440 0.0178 0.0068 0.0034 0.0008 0.0000 
1 0.9916 0.8470 0.5847 0.1979 0.1010 0.0475 0.0274 0.0081 0.0009 
2 0.9997 0.9700 0.8416 0.4480 0.2812 0.1608 0.1054 0.0398 0.0065 
3 1.0000 0.9958 0.9559 0.6982 0.5214 0.3552 0.2612 0.1243 0.0287 
4 0.9996 0.9908 0.8702 0.7416 0.5842 0.4755 0.2793 0.0898 
5 1.0000 0.9986 0.9562 0.8884 0.7805 0.6898 0.4859 0.2120 
6 0.9998 0.9884 0.9618 0.9067 0.8506 0.6925 0.3953 
7 1.0000 0.9976 0.9897 0.9686 0.9424 0.8499 0.6048 
8 0.9996 0.9979 0.9917 0.9826 0.9417 0.7880 
9 1.0000 0.9997 0.9984 0.9960 0.9825 0.9102 
10 1.0000 0.9998 0.9993 0.9961 0.9713 
11 1.0000 0.9999 0.9994 0.9936 
12 1.0000 0.9999 0.9991 
13 0.9999 
15 0 0.8601 0.4633 0.2059 0.0352 0.0134 0.0048 0.0023 0.0005 0.0000 
1 0.9904 0.8291 0.5491 0.1672 0.0802 0.0353 0.0194 0.0052 0.0005 
2 0.9996 0.9638 0.8160 0.3980 0.2361 0.1268 0.0794 0.0271 0.0037 
3 1.0000 0.9946 0.9444 0.6482 0.4613 0.2969 0.2092 0.0905 0.0176 
4 0.9994 0.9873 0.8358 0.6865 0.5255 0.4041 0.2173 0.0592 
5 1.0000 0.9978 0.9390 0.8516 0.7216 0.6184 0.4032 0.1509 
6 0.9997 0.9820 0.9434 0.8689 0.7970 0.6098 0.3036 
7 1.0000 0.9958 0.9827 0.9500 0.9118 0.7869 0.5000 
8 0.9992 0.9958 0.9848 0.9692 0.9050 0.6964 
9 0.9999 0.9992 0.9964 0.9915 0.9662 0.8491 
10 1.0000 0.9999 0.9993 0.9982 0.9907 0.9408 
11 1.0000 0.9999 0.9997 0.9981 0.9824 
12 1.0000 1.0000 0.9997 0.9963 
13 1.0000 0.9995 
14 1.0000 


Source: For n = 2 through 10, adapted with permission from E. Parzen, Modern Probability Theory and Its 
Applications, John Wiley, New York, 1962. For n = 11 through 15, adapted with permission from Tables of 
Cumulative Binomial Probability Distribution, Harvard University Press, Cambridge, M.A., 1955. 
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Zz 


0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.08 


0.09 


0.0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1.0 
1.1 
1.2 
1.3 
14 
1.5 
1.6 
1.7 
1.8 
1.9 
2.0 
2.1 
2.2 
2.3 
2.4 
25 
2.6 
Ded 
2.8 
2.9 
3.0 


0.5000 
0.4602 
0.4207 
0.3821 
0.3446 
0.3085 
0.2743 
0.2420 
0.2119 
0.1841 
0.1587 
0.1357 
0.1151 
0.0968 
0.0808 
0.0668 
0.0548 
0.0446 
0.0359 
0.0287 
0.0228 
0.0179 
0.0139 
0.0107 
0.0082 
0.0062 
0.0047 
0.0035 
0.0026 
0.0019 
0.0013 


0.4960 
0.4562 
0.4168 
0.3783 
0.3409 
0.3050 
0.2709 
0.2389 
0.2090 
0.1814 
0.1562 
0.1335 
0.1131 
0.0951 
0.0793 
0.0655 
0.0537 
0.0436 
0.0351 
0.0281 
0.0222 
0.0174 
0.0136 
0.0104 
0.0080 
0.0060 
0.0045 
0.0034 
0.0025 
0.0018 
0.0013 


0.4920 
0.4522 
0.4129 
0.3745 
0.3372 
0.3015 
0.2676 
0.2358 
0.2061 
0.1788 
0.1539 
0.1314 
0.1112 
0.0934 
0.0778 
0.0643 
0.0526 
0.0427 
0.0344 
0.0274 
0.0217 
0.0170 
0.0132 
0.0102 
0.0078 
0.0059 
0.0044 
0.0033 
0.0024 
0.0018 
0.0013 


0.4880 
0.4483 
0.4090 
0.3707 
0.3336 
0.2981 
0.2643 
0.2327 
0.2033 
0.1762 
0.1515 
0.1292 
0.1093 
0.0918 
0.0764 
0.0630 
0.0516 
0.0418 
0.0336 
0.0268 
0.0212 
0.0166 
0.0129 
0.0099 
0.0075 
0.0057 
0.0043 
0.0032 
0.0023 
0.0017 
0.0012 


0.4840 
0.4443 
0.4052 
0.3669 
0.3300 
0.2946 
0.2611 
0.2297 
0.2005 
0.1736 
0.1492 
0.1271 
0.1075 
0.0901 
0.0749 
0.0618 
0.0505 
0.0409 
0.0329 
0.0262 
0.0207 
0.0162 
0.0125 
0.0096 
0.0073 
0.0055 
0.0041 
0.0031 
0.0023 
0.0016 
0.0012 


0.4801 
0.4404 
0.4013 
0.3632 
0.3264 
0.2912 
0.2578 
0.2266 
0.1977 
0.1711 
0.1469 
0.1251 
0.1056 
0.0885 
0.0735 
0.0606 
0.0495 
0.0401 
0.0322 
0.0256 
0.0202 
0.0158 
0.0122 
0.0094 
0.0017 
0.0054 
0.0040 
0.0030 
0.0022 
0.0016 
0.0011 


0.4761 
0.4364 
0.3974 
0.3594 
0.3228 
0.2877 
0.2546 
0.2231 
0.1949 
0.1685 
0.1446 
0.1230 
0.1038 
0.0869 
0.0721 
0.0594 
0.0485 
0.0392 
0.0314 
0.0250 
0.0197 
0.0154 
0.0119 
0.0091 
0.0069 
0.0052 
0.0039 
0.0029 
0.0021 
0.0015 
0.0011 


0.4721 
0.4325 
0.3936 
0.3557 
0.3192 
0.2843 
0.2514 
0.2206 
0.1922 
0.1660 
0.1423 
0.1210 
0.1020 
0.0853 
0.0708 
0.0582 
0.0475 
0.0384 
0.0307 
0.0244 
0.0192 
0.0150 
0.0116 
0.0089 
0.0068 
0.0051 
0.0038 
0.0028 
0.0021 
0.0015 
0.0011 


0.4681 
0.4286 
0.3897 
0.3520 
0.3156 
0.2810 
0.2483 
0.2177 
0.1984 
0.1635 
0.1401 
0.1190 
0.1003 
0.0838 
0.0694 
0.0571 
0.0465 
0.0375 
0.0301 
0.0239 
0.0188 
0.0146 
0.0113 
0.0087 
0.0066 
0.0049 
0.0037 
0.0027 
0.0020 
0.0014 
0.0010 


0.4641 
0.4247 
0.3859 
0.3483 
0.3121 
0.2776 
0.2451 
0.2148 
0.1867 
0.1611 
0.1379 
0.1170 
0.0985 
0.0823 
0.0681 
0.0559 
0.0455 
0.0367 
0.0294 
0.0233 
0.0183 
0.0143 
0.0110 
0.0084 
0.0064 
0.0048 
0.0036 
0.0026 
0.0019 
0.0014 
0.0010 


Source: Adapted with permission from P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, 
New York, 1971, p. 391. 
“This table gives the probability that the standard normal variable Z will exceed a given positive value z, that is, 
P{Z > za} =a. The probabilities for negative values of z are obtained by symmetry. 
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Table ST4. Student’s t-Distribution’ 


a 
n 0.10 0.05 0.025 0.01 0.005 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.393 3.182 4.541 5.841 
4 1.533 2132 2.776 3.747 4.604 
2) 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 1.860 2.306 2.896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2.718 3.106 
12 1.356 1.782 2.179 2.681 3.055 
13 1.350 1.771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2131 2.602 2.947 
16 1.337 1.746 2.120 2.583 2.921 
17 1.333 1.740 2.110 2.567 2.898 
18 1.330 1.734 2.101 2.552 2.878 
19 1.328 1.729 2.093 2939 2.861 
20 1.325 1.725 2.086 2.528 2.845 
21 1.323 1.721 2.080 2.518 2.831 
22 1.321 1.717 2.074 2.508 2.819 
23 1.319 1.714 2.069 2.500 2.807 
24 1.318 1.711 2.064 2.492 2.797 
25 1.316 1.708 2.060 2.485 2.787 
26 1.315 1.706 2.056 2.479 2.779 
27 1.314 1.703 2.052 2.473 2.771 
28 1.313 1.701 2.048 2.467 2.763 
29 1.311 1.699 2.045 2.462 2.756 
30 1.310 1.697 2.042 2.457 2.750 
40 1.303 1.684 2.021 2.423 2.704 
60 1.296 1.671 2.000 2.390 2.660 
120 1.289 1.658 1.980 2.358 2.617 
oo 1.282 1.645 1.960 2.326 2.576 


Source: P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393. Reprinted 
by permission of John Wiley & Sons, Inc. 

“The first column lists the number of degrees of freedom (7). The headings of the other columns give probabilities 
(a) for t to exceed the entry value. Use symmetry for negative ft values. 
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Table ST6. Random Normal Numbers, 4 = 0 and 0 = 1 
1 2 3 4 5 6 7 8 9 10 


0.464 0.137. 2.455 —0.323 —0.068 0.290 —0.288 1.298 0.241 —0.957 
0.060 —2.526 —0.531 —0.194 0.543 —1.558 0.187 —1.190 0.022 0.525 
1.486 —0.354 —0.634 0.697 0.926 1.375 0.785 —0.963 —0.853 —1.865 
1.022 -—0.472 1.279 3.521 0.571 —1.851 0.194 1.192 —0.501 —0.273 
1.394 —0.555 0.046 0.321 2.945 1.974 -—0.258 0.412 0.439 —0.035 
0.906 —0.513 —0.525 0.595 0.881 —0.934 1.579 0.161 —1.885 0.371 
1.179 -1.055 0.007 0.769 0.971 0.712 1.090 —0.631 —0.255 —0.702 
1.501 —0.488 —0.162 —0.136 1.033 0.203 0.448 0.748 —0.423 —0.432 
—0.690 0.756 —1.618 —0.345 —0.511 —2.051 -—0.457 —0.218 0.857 —0.465 
1.372 0.225 0378 0.761 0.181 —0.736 0.960 —1.530 —0.260 0.120 
—0.482 1.678 —0.057 —1.229 —0.486 0.856 -—0.491 —1.983 —2.830 —0.238 
—1376 —0.150 1.356 —0.561 —0.256 —0.212 0.219 0.779 0.953 —0.869 
—1.010 0.598 0.918 1.598 0.065 0415 —0.169 0.313 —0.973 —1.016 
—0.005 —0.899 0.012 —0.725 1.147 —0.121 1.096 0.481 —1.691 0.417 
1.393 1.163 —0.911 1.231 —0.199 —0.246 1.239 —2.574 —0.558 0.056 
1.787 —0.261 1.237 1.046 —0.508 —1.630 —0.146 -—0.392 -—0.627 0.561 
0.105 -0.357 -1.384 0.360 -0.992 -0.116 -1,698 -—2.832 -1.108 -—2.357 
—1.339 1.827 0.959 0.424 0.969 —1.141 —1.041 0.362 1.726 1.956 
1.041 0.535 0.731 1.377 0.983 —1.330 1.620 —1.040 0.524 —0.281 
0.279 —2.056 0.717 —0.873 —1.096 —1.396 1.047. 0.089 —0.573 0.932 
1.805 —2.008 —1.633 0.542 0.250 —0.166 0.032 0.079 0.471 —1.029 
-1.186 1.180 1.114 0.882 1.265 —0.202 0.151 —0.376 -—0.310 0.479 
0.658 —1.141 1.151 1.210 0.927 0.425 0.290 —0.902 0.610 2.709 
—0.439 0.358 —1.939 0.891 —0.227 0.602 0.873 —0.437 —0.220 —0.057 
1.399 —0.230 0.385 —0.649 —0.577 0.237 —0.289 0.513 0.738 —0.300 
0.199 0.208 1.083 —0.219 —0.291 1.221 1.119 0.004 —2.015 —0.594 
0.159 0.272 —0.313 0.084 —2.828 -—0.430 -—0.792 —1.275 —0.623 —1.047 
2.273 0.606 0.606 —0.747 0.247 1.291 0.063 —1.793 —0.699 —1.347 
0.041 —0.307 0.121 0.790 —0.584 0.541 0.484 —0.986 0.481 0.996 
—1.132 —2.098 0.921 0.145 0.446 —1.661 1.045 —1.363 —0.586 —1.023 
0.768 0.079 —1.473 0.034 —2.127 0.665 0.084 —0.880 —0.579 0.551 
0.375 —1.658 —0.851 0.234 —0.656 0.340 —0.086 —0.158 —0.120 0.418 
—0.513 —0.344 0.210 —0.736 1.041 0.008 0.427 —0.831 0.191 0.074 
0.292 —0.521 1.266 —1.206 —0.899 0.110 —0.528 -—0.813 0.071 0.524 
1.026 2.990 —0.574 0.491 —1.114 1.297 -1.433 -1.345 -3.001 0.479 
—1.334 1.278 —0.568 —0.109 —0.515 —0.566 2.923 0.500 0.359 0.326 
0.287 —0.144 -0.254 0.574 —0.451 —1.181 —1.190 —0.318 —0.094 1.114 
0.161 —0.886 —0.921 —0.509 1.410 —0.518 0.192 —0.432 1.501 1.068 
—1.346 0.193 1.202 0.394 1.045 0.843 0.942 1.045 0.031 0.772 
1.250 —0.199 —0.288 1.810 1.378 0.584 1.216 0.733 0.402 0.226 
0.630 —0.537 0.782 0.060 0.499 —0.431 1.705 1.164 0.884 —0.298 
0.375 —1.941 0.247 —0.491 0.665 —0.135 —0.145 -0.498 0.457 1.064 
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1 2 3 4 5 6 7 8 9 10 
—1.420 0.489 —1.711 —1.186 0.754 —0.732 —0.066 1.006 —0.798 0.162 
0.151 0.243 0.430 0.762 0.298 1.049 1.810 2.885 —0.768 —0.129 
—0.309 0.531 0.416 —1.541 1.456 2.040 —0.124 0.196 0.023 —1.204 
0.424 —0.444 0.593 0.993 —0.106 0.116 0.484 —1.272 1.066 1.097 
0.593 0.658 1.127 1.407 1.579 1.616 1.458 1.262 0.736 —0.916 
0.862 0.885 0.142 0.504 0.532 1.381 0.022 —0.281 —0.342 1.222 
0.235 0.628 0.023 0.463 0.899 0.394 0.538 1.707 —0.188 —1.153 
—0.853 0.402 0.777 0.833 0.410 —0.349 —1.094 0.580 1.395 1.298 
Source: From tables of the RAND Corporation, by permission. 
Table ST7. Critical Values of the Kolmogorov-Smirnov One-Sample Test Statistic“ 
One-Sided Test: 
a= 0.10 0.05 0.025 0.01 0.005 a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: 
a= 0.20 0.10 0.05 0.02 0.01 a= 0.20 0.10 0.05 0.02 0.01 
n=1 0.900 0.950 0.975 0.990 0.995 n=21 0.226 0.259 0.287 0.321 0.344 
2 0.684 0.776 0.842 0.900 0.929 22 0.221 0.253 0.281 0.314 0.337 
3 0.565 0.636 0.708 0.785 0.829 23 0.216 0.247 0.275 0.307 0.330 
4 0.493 0.565 0.624 0.689 0.734 24 0.212 0.242 0.269 0.301 0.323 
5 0.447 0.509 0.563 0.627 0.669 25 0.208 0.238 0.264 0.295 0.317 
6 0.410 0.468 0.519 0.577 0.617 26 0.204 0.233 0.259 0.290 0.311 
7 0.381 0.436 0.483 0.538 0.576 27 0.200 0.229 0.254 0.284 0.305 
8 0.358 0.410 0.454 0.507 0.542 28 0.197 0.225 0.250 0.279 0.300 
9 0.339 0.387 0.430 0.480 0.513 29 0.193 0.221 0.246 0.275 0.295 
10 0.323 0.369 0.409 0.457 0.489 30 0.190 0.218 0.242 0.270 0.290 
11 0.308 0.352 0.391 0.437 0.468 31 0.187 0.214 0.238 0.266 0.285 
12 0.296 0.338 0.375 0.419 0.449 32 0.184 0.211 0.234 0.262 0.281 
13. 0.285 0.325 0.361 0.404 0.432 33 0.182 0.208 0.231 0.258 0.277 
14 0.275 0.314 0.349 0.390 0.418 34 0.179 0.205 0.227 0.254 0.273 
15 0.266 0.304 0.338 0.377 0.404 35 0.177 0.202 0.224 0.251 0.269 
16 0.258 0.295 0.327 0.366 0.392 36 0.174 0.199 0.221 0.247 0.265 
17 0.250 0.286 0.318 0.355 0.381 37. 0.172 0.196 0.218 0.244 0.262 
18 0.244 0.279 0.309 0.346 0.371 38 0.170 0.194 0.215 0.241 0.258 
19 0.237 0.271 0.301 0.337 0.361 39 0.168 0.191 0.213 0.238 0.255 
20 0.232 0.265 0.294 0.329 0.352 40 0.165 0.189 0.210 0.235 0.252 
Approximation 1.07 1.22 1.36 1.52 1.63 
for n > 40 Jn Jn Jn Vn Vn 


Source: Adapted by permission from Table | of Leslie H. Miller, Table of Percentage points of Kolmogrov 


statistics, J. Am. Stat. Assoc. 51 (1956), 111-121. 
“This table gives the values of D+. and Dn, for which a > P{Dy> > Dt 


na nia 
selected values of n and a. 


} and a > P{Dn > Dn,o.} for some 
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Table ST8. Critical Values of the Kolmogorov—Smirnov Test Statistic for Two Samples of 
Equal Size“ 


One-Sided Test: 


a= 0.10 0.05 0.025 0.01 0.005 a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: 

a= 0.20 0.10 0.05 0.02 0.01 a= 0.20 0.10 0.05 0.02 0.01 

n=3 2/3 2/3 n=20 6/20 7/20 8/20 9/20 10/20 

4 3/4 3/4 3/4 21 = 6/21 7/21 8/21 9/21 10/21 

5 3/5 3/5 4/5 4/5 4/5 22 7/22 8/22 8/22 10/22 10/22 

6 3/6 4/6 4/6 5/6 5/6 23. 7/23 8/23 9/23 10/23 = 10/23 

7 4/7 4/7 5/7 S/7 5/7 24 7/24 8/24 9/24 10/24 11/24 

8 4/8 4/8 5/8 5/8 6/8 25 7/25 8/25 9/25, 10/25 11/25 

9 4/9 5/9 5/9 6/9 6/9 26 37/26 8/26 9/26 =10/26 11/26 

10 4/10 5/10 6/10 6/10 7/10 27 7/27 8/27 9/27) 11/27) 11/27 

11 S/ll) = 6S/11) 6/11 TAL 7/AI 28 = 8/28 9/28 10/28 11/28 = 12/28 

12 5/12 5/12 6/12 FAQ “HAZ 29 = 8/29 9/29, 10/29 11/29 12/29 

13. 5/13) «6/13 6/13 TA3 8/13 30 ~—- 8/30 9/30 10/30 11/30 12/30 

14 5/14 6/14 7/14 TN4 8/14 31 = 8/31 9/31 10/31 11/31 12/31 

15 S/IS) 6/15 7/15 8/15 8/15 32 = 8/32 9/32 10/32 12/32 = 12/32 

16 6/16 6/16 7/16 8/16 9/16 34 = 8/34 10/34 —s11/34—S 12/34: 13/34 

17) 6/17) T/T) TAT 8/17 9/17 36 =: 9/36 :10/36-—s:11/36—-:12/36—s: 113/36 

18 6/18 7/18 8/18 918 9/18 38 =9/38 =:10/38)=—:11/38 13/38 14/38 

19 6/19 7/9 8/19 99 9/19 40 9/40 10/40 12/40 13/40 14/40 


Approximation 1.52 1.73 1.92 2.15 2.30 
for n > 40: Jn vn vn Jn vn 


Source: Adapted by permission from Tables 2 and 3 of Z. W. Birnbaum and R. A. Hall, Small sample distributions 
for multisample statistics of the Smirnov type, Ann. Math. Stat. 31 (1960), 710-720. 

“This table gives the values of DE cy and Dn.n,a for which a > P{Dy, > Drnot and a > P{Dijn > Drjn,v} 
for some selected values of n and a. 
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Table ST9. Critical Values of the Kolmogorov—Smirnov Test Statistic for Two Samples of 


Unequal Size“ 


One-Sided Test: a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: a= 0.20 0.10 0.05 0.02 0.01 
N=1 No =9 17/18 
10 9/10 
Ni =2 No =3 5/6 
4 3/4 
5 4/5 4/5 
6 5/6 5/6 
is 5/7 6/7 
8 3/4 7/8 7/8 
9 1/9 8/9 8/9 
10 TNO 4/5 9/10 
Ni =3 No =4 3/4 3/4 
5 2/3 4/5 4/5 
6 2/3 2/3 5/6 
7 2/3 5/7 6/7 6/7 
8 5/8 3/4 3/4 7/8 
9 2/3 2/3 71/9 8/9 8/9 
10 3/5 T/A10 4/5 9/10 9/10 
12 W12 2/3 3/4 5/6 11/12 
Ni =4 No =5 3/5 3/4 4/5 4/5 
6 WN2 2/3 3/4 5/6 5/6 
7 17/28 5/7 3/4 6/7 6/7 
8 5/8 5/8 3/4 7/8 7/8 
9 5/9 2/3 3/4 1/9 8/9 
10 11/20 13/20 7/10 4/5 4/5 
12 W2 2/3 2/3 3/4 5/6 
16 9/16 5/8 11/16 3/4 13/16 
N=5 No =6 3/5 2/3 2/3 5/6 5/6 
~ 4/7 23/35 5/7 29/35 6/7 
8 11/20 5/8 27/40 4/5 4/5 
9 5/9 3/5 31/45 1/9 4/5 
10 1/2 3/5 TNO TNO 4/5 
15 8/15 3/5 2/3 11/15 11/15 
20 1/2 11/20 3/5 7/10 3/4 
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Table ST9. (Continued) 


One-Sided Test: a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: a= 0.20 0.10 0.05 0.02 0.01 
Ni =6 No=7 = 23/42 4/7 29/42 5/7 5/6 
8 1/2 7/2 2/3 3/4 3/4 
9 1/2 5/9 2/3 13/18 719 
10 1/2 17/30 19/30 7/0 11/15 
12 1/2 7/12 7/12 2/3 3/4 
18 4/9 5/9 11/18 2/3 13/18 
2411/24 1/2 TN2 5/8 2/3 
Ni =7 No=8 = 27/56 33/56 5/8 41/56 3/4 
9 31/63 5/9 40/63 5/7 47/63 
10 33/70 39/70 43/70 7/10 5/7 
14 3/7 1/2 4/7 9/14 5/7 
28 3/7 13/28 15/28 17/28 9/14 
Ni =8 Nr. =9 4/9 13/24 5/8 213 3/4 
10 = 19/40 21/40 23/40 27/40 7/0 
12 11/24 1/2 TN2 5/8 2/3 
16 7/16 1/2 9/16 5/8 5/8 
32 = 13/32 7/16 1/2 9/16 19/32 
Ni =9 N2 = 10 TNS 1/2 26/45 2/3 31/45 
12 4/9 1/2 5/9 11/18 2/3 
1S 19/45 22/45 8/15 3/5 29/45 
18 7/18 4/9 1/2 5/9 11/18 
36 =: 13/36 5/12 17/36 19/36 5/9 
N, = 10 N2 = 15 2/5 TNS 1/2 17/30 19/30 
20 2/5 9/20 1/2 11/20 3/5 
40 7/20 2/5 9/20 1/2 
N, = 12 No=15 23/60 9/20 1/2 11/20 TN2 
16 3/8 7/16 23/48 13/24 WN2 
18 13/36 5/12 17/36 19/36 5/9 
20 ~=11/30 5/12 7/5 31/60 17/30 
Ni =15 N2 = 20 7/20 2/5 13/30 29/60 31/60 
Ni = 16 No=20 = 27/80 31/80 17/40 19/40 41/80 


Large-s 1 / / / 
arge ae Lor" 120/4" 1.36 m+n 1.52 m+n 1.63 m+n 
approximation mn mn mn mn mn 


Source: Adapted by permission from F. J. Massey, Distribution table for the deviation between two sample 
cumulatives, Ann. Math. Stat. 23 (1952), 435-441. 

“This table gives the values of Dj, ¢, and Dn,n,o. for which a > P{D} , > Dik no } and a > P{Dm,n > Din,n,o} 
for some selected values of N; = smaller sample size, N2 = larger sample size, and a. 
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Table ST10. Critical Values of the Wilcoxon Signed-Ranks Test Statistic“ 


a 

n 0.01 0.025 0.05 0.10 
3 6 6 6 6 
4 10 10 10 9 
2) 15 15 14 12 
6 21 20 18 17 
7 27 25 24 22 
8 34 32 30 27 
9 41 39 36 34 
10 49 46 44 40 
11 58 55 52 48 
12 67 64 60 56 
13 78 73 69 64 
14 89 84 79 73 
15 100 94 89 83 
16 112 106 100 93 
17 125 118 111 104 
18 138 130 123: 115 
19 152 143 136 127 
20 166 157 149 140 


Source: Adapted by permission from Table 1 of R. L. McCornack, Extended tables of the Wilcoxon matched 
pairs signed-rank statistics, J. Am. Stat. Assoc. 60 (1965), 864-871. 

“This table gives values of ta for which P{TT > ta} < @ for selected values of n and a. Critical values in the 
lower tail may be obtained by symmetry from the equation t;_ 4 =n(n+1)/2—te. 
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Table ST11. Critical Values of the Mann—Whitney—Wilcoxon Test Statistic“ 


m a 2 3 4 3 6 7 8 9 10 
2 0.01 4 6 8 10 12 14 16 18 20 
0.025 4 6 8 10 12 14 15 17 19 

0.05 4 6 8 9 11 13 14 16 18 

0.10 4 5 7 8 10 12 13 15 16 

3 0.01 9 12 15 18 20 20 25 28 
0.025 9 12 14 16 19 21 24 26 

0.05 8 11 13 15 18 20 22 25 

0.10 7 10 12 14 16 18 21 23 

4 0.01 16 19 22 26 29 32 36 
0.025 15 18 21 24 24 31 34 

0.05 14 17 20 23 26 29 32 

0.10 12 15 18 21 24 26 29 

5 0.01 23 27 31 35 39 43 
0.025 22 26 29 33 37 4] 

0.05 20 24 28 31 35 38 

0.10 19 22 26 29 32 36 

6 0.01 32 37 41 46 51 
0.025 30 35 39 43 48 

0.05 28 33 37 41 45 

0.10 26 30 34 38 42 

7 0.01 42 48 23 58 
0.025 40 45 50 55 

0.05 37 42 47 52 

0.10 35 39 44 48 

8 0.01 54 60 66 
0.025 50 56 62 

0.05 48 53 59 

0.10 44 49 55 

9 0.01 66 73 
0.025 63 69 

0.05 59 65 

0.10 55 61 

10 0.01 80 
0.025 76 

0.05 72 

0.10 67 


Source: Adapted by permission from Table | of L. R. Verdooren, Extended tables of critical values for Wilcoxon’s 
test statistic, Biometrika 50 (1963), 177-186, with the kind permission of Professor E. S. Pearson, the author, 
and the Biometrika Trustees. 

“This table gives values of uq for which P{U > ua} < a for some selected values of m,n, and a. Critical values 
in the lower tail may be obtained by symmetry from the equation u;_, = mn—Ugq. 
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Table ST12. Critical Points of Kendall’s Tau Test Statistic’ 


a 
n 0.100 0.050 0.025 0.01 
3 3 3 3 3 
+ 4 4 6 6 
5 6 6 8 8 
6 7 9 11 11 
7 9 11 13 15 
8 10 14 16 18 
9 12 16 18 22 
10 15 19 21 25 


Source: Adapted by permission from Table 1, p. 173, of M. G. Kendall, Rank Correlation Methods, 3rd ed., 
Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric Statistics, John Wiley, 
New York, 1971, p. 390. 

“This table gives the values of Sq for which P{S > Sa} < a, where S = (3) T, for some selected values of a 
and n. Values in the lower tail may be obtained by symmetry, $;_4~ = —Sa. 


Table ST13. Critical Values of Spearman’s Rank Correlation Statistic“ 


a 

n 0.01 0.025 0.05 0.10 
3 1.000 1.000 1.000 1.000 
4 1.000 1.000 0.800 0.800 
3 0.900 0.900 0.800 0.700 
6 0.886 0.829 0.771 0.600 
ay 0.857 0.750 0.679 0.536 
8 0.810 0.714 0.619 0.500 
9 0.767 0.667 0.583 0.467 

10 0.721 0.636 0.552 0.442 


Source: Adapted by permission from Table 2, pp. 174-175, of M. G. Kendall, Rank Correlation Methods, 3rd 
ed., Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric Statistics, John 
Wiley, New York, 1971, p. 391. 

“This table gives the values of Ra for which P{R > Ra} < a for some selected values of n and a. Critical 
values in the lower tail may be obtained by symmetry, R}_, = —Ra. 


ANSWERS TO SELECTED PROBLEMS 


Problems 1.3 
1. (a) Yes; (b) yes; (c) no. 2.(a) Yes; = (b) no; (c) no. 
6. (a) 0.9; (b) 0.05; (c) 0.95. 7. 1/16. 8. i + 5én2 = 0.487. 


Problems 1.4 


(YUE IC) som siete 

stows a (MET/MET) 

12. (@) a/( 7 ) co 94) /( 7 ) (©) 13( ° )/( : ) 
ono(3)0(2)/(3)@(2)-20-4/(3) 
 oay—4—91/() @a( > )( 
»~QYMI OMI AUDA 
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Problems 1.5 

— iy 
3. a(pby (" ) pc oy 4.p/(2—p) 

l=0 é 

zi | 

5. sum r'/oume=s a5 for large N 6.n=4 
10. Bice 11. bape (b) 1/3 12. 0.08 
13. (a) 173/480 (b) 108/173; 15/173 14. 0.0872 
Problems 1.6 
1.1/(2—p); (1—p)/(2—-p) 4. p’(1—p)*[3—7Tp(1 —p)| 


12. For any two disjoint intervals /), J C (a,b), (1 )€(b) = (b—a)l( 1h), where 
£(1) = length of interval /. 


ae 8/36 ifn=1 
. Pna>= n— n— n— 
2(%5)" (8) +2(35)" Ge) +2088)" Gs) > 2 
(b) 22/45 


(c) 12/36; 2(2)" (35) (as) +2(38)"” (58) (Se) +2(88)”* 38) (%) 
forn =2,3,.... 


Problems 2.2 

3. Yes; yes 

4. 6;{(1,1,1,1,2),(1,1,1,2,1),(1,1,2,1,1), (1,2, 1, 1,1), (2,1,1,1,1)}; {(6,6, 6, 6, 6) }; 
1(6;6;6,6,6),(6,6,6,6,5),(6,6,6,5,6),(6,6,5,6,6),(6,5,6,6,6),(5,6,6,6,6)} 

5. Yes; (1/4, 1/2) U (3/4, 1) 


Problems 2.3 
x 0 1 2 3 
P(X=x) 1/8 3/8 3/8 1/8 
F(x) =0, x <0, =1/8,0<x< 1; =1/2, 1<x<2; =5/8,2<x<3; 
=|, 


1. 


x>3 
3. (a) Yes; (b) yes; (c) yes; yes 
Problems 2.4 
Ldap a(=p "Nan 
2% ©l/s Wer 


3. Yes; Fo(x) =0 x<0, =1—e—™ — Oxe—™ for x > 0; P(X > 1) =1—Fo(1) 

4. Yes; F(x) =0,x<0;=1-(1+ xh; )e*/? forx > 0 

6. F(x) =e*/2 forx <0, =1-—e%*/2 forx >0 

8. (c), (d), and (f) 

9. Yes; (a) 1/2,0<x<1,1/4for2<x<4; (b) 1/(28), |x| <0; 
(c)xe~*,x > 0; (d) (x—1)/4 for 1 < x < 3, and P(X =3) = 1/2; 
(e) 2xe-* , x > 0 

10. If S(x) = 1— F(x) = P(X > x), then S’(x) = —f(x) 


ANSWERS TO SELECTED PROBLEMS 


Problems 2.5 

2.X4£1/x 

4. ae = exp(—276)] 1 -y le* arc COS y 4. e727 0+0 arc COS »] ; ly| <1; 
dexp{—6 arctan z}[(1+27)(1—e79"|-}, z>0 


dexp{—né — arctan z}[(1+27)(1 —e—°7)]-!, z<0 
10. fix\(y) = 2/3 forO< y<1,=1/3 forl<y<2 
12. (a) 0,y < 0; F(0) for —1 <y <1, and 1 for y> 1; 
(b) = 0 if y < —b, = F(—b) if y = —b, = F(y) if -b<y <b, =1ify>b; 


(c) = F(y) ify < —b, = F(—b) if —b< y <0, =F(b) if0<y <b, =F(y). 


ify>b. 


Problems 3.2 

3. EX**=0 if 2r < 2m—1 is an odd integer, 
= P(m—r+!)r (r+!) 

9. 3p =a(1—v)/v, where v = (1 —p)!/* 

10. Binomial: a3 = (q—p)/,/npq, a4 = 3 + (1 — 6pq) /3npq 
Poisson: a3 = A~!/?, ag =3+1/2. 


if 2r < 2m— 1 is an even integer 


Problems 3.3 


Lb) ee —1)/(1—e); © pl —(gs)"")/[ —gs\(1— gh"), 8 < 1/4. 


6. f (95) /f(); Fe") /F(9). 


Problems 3.4 
2 _ _ _ soy. 
3. For any ae 0 take P(X yee oe P(X = -£) = oa 1 £0. 
5. P(x? _ | = Se 1 <K < J2 
_ot 
P(X? = Ko?) = hoe. 
Problems 4.2 
Be Seen: 7. Marginals negative binomial, so also conditionals. 


8. A(y|x) = 5 (CP? +2°)/(P +2 +y)”. 

9. X ~ B(pi,p2+p3); ¥/(1—x) ~ B(p2,p3). 

10. X ~ G(a, 1/8), Y~ Ga+y,1/8), X/y ~ Bla, 7), Y-x~ Gly, 1/8). 
14. P(X <7)=1—e77 15, 1/24; 15/16. 17. 1/6. 


Problems 4.3 
3. No; Yes; No. 10. = 1—a/(2b) ifa <b, =b/(2a) ifa>b. 
11.A/(A+ 4p); 1/2. 


Problems 4.4 

2. (b) fyju(v|u) = 1/(2u), |v] <u, u > 0. 

6. P(X =x,M =m) =n(1—7)"™(1—(1—2)"*] ifx=m, =7?(1—1)"?* 
if x<m. P(M =m) =2n(1—71)"™—2n(2—7)(1—7)*", m>0. 
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7. f(x) = ke Jk, <x <k+1, k=0,1,2,... 
11. fy(u) =3u?/(1+u)*,u>0. 


13. (a) Fu,v(u,v) = [I exp ( we) | (4X) if u > 0, |v] < 7/2, 


207 


= 1-exp|l eee I, ifu>0,v> a /2, = 0 elsewhere. 
1 enw v 
OF (4) = Jee Tapa 


Problems 4.5 


get 9ft2 . 
2. EX'Y! = CESVCED + 3EED EFS" 3. cov(X, Y) = 0; X, Y dependent. 
15. My,v(u,v) = (1 —2v)~! exp{u?/(1— 2v)} for v < 1/2; p(U,V) =0; no. 
18. pz.w = (04 — 07) sin@ cos0/\/var(Z) var(W). 


21. If U has PDF f, then EX” = EU" /(m+ 1) for m> 0; p = 4 — Tare 
3 var 5 


Problems 4.6 
l.utol[f(=*) f(2#)] (54) &(*)] where © is the standard normal DF. 
2. (a) 2(1+X). 3. E{X|y} = 1 + pS (y— p2). 4, E(var{Y|X}). 
6.4/9. 7.(a)1; (b) 1/4. 8.x*/(k+1); 1/(1+%). 
Problems 4.7 
5. (a) (S71) /B;  (b) 7h. 
j=l 

Problems 5.2 
5. Fr(y)=( 7 )/( . ),PW=y) = ( yo \/( i ).y>M+1, and 

M M? M-1 M/ , 


N 
P(Y=M)=1/( +f Lb PCtacgely =y)= Peat a O< xi <y, 
i=1,...,j, 4 Ax forifxj. 


9. P(Y; =x) =qp* +pq,x>1.P(%,=x)=p’¢'+¢p',x>1 
P(Y, =x) = P(Y, =x) for n odd; = P(¥2 = x) for n even. 


Problems 5.3 
2 (a) P{ F(X) = Dino ( : pp) b = ( : )p*(1—p)""*, x=0,1,...,n. 


13. = 1 ie ie 1eZ zn) 


22. X/|Y¥| ~ C(1,0); (2/m)14+2)7!1,0<z< 00. 
W7.(@t/e; C)=0ift<0,—a/titi>é; @(a/sjr. 
29. (b) 1/(2\/m); 1/2. 
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Problems 5.4 

1. (@) oy = 45 fp = 15/4, p= —3/4;  (b) N(6— 2x, 8); (©) 0.3191. 

4. BN(aju +b, cu2 +d,a’o7,c?0%, p). 6. tan d= EX? /EY?, 1.3, =e. 
Problems 6.2 


LAX =0) =P X= 1H 1/8, P( 5 =1/ 3) SP (% =]2/3) = 378 

PS =0) =1/4, 79 =1/3) =3 4, 

ao 1 Is- 2 25 3 35 4 £45 5 5.5 6 
“p(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 


Problems 6.3 

1. {F(min(z, y))- F(x)F(y)}/n. 

6. E(S?)* = Gop (n—1)(n+2)-+ (n+ 2k—3), k> 1. 

9, (a) P(X = t) =e—"™(ndA)™ /(tn)!, t= 0,1/n,2/n,...,  (b) C(1,0); 
(c) P(nm/2,2/n). 10. (b) 2/,/an; 3 +6/(an). 

11. 0,1,0,E(Xn—0.5)4/(144n?). 12. var(S2) = 1(\+ 22°) > var(X). 


Problems 6.4 
ee 2)); 2n*{(m+6)* + (n— 2)\m+ 20) }/\m *(n—2)?(n—4)]. 


3. 6, fF vn> Ui gig(14+8)— (8 Fh ve n> 2. 


TC 
11. 2m"/?n wnt me? aye e" /B( 2,3), 00 <Z< 00. 


Problems 6.5 
Lt(n—1) 2.t(m+n—2) 3. (25)'T (25144) /T (52). 


n—1 
Problems 6.6 
29 PH GED 
3. [2m (1 — 2)? [1 4 eed "both ~ t(n). 
4./n—1T~t(n—1). 


Problems 7.2 


1.No. 2. Yes 
3.%, > Y~ F(y) =0if y <0, =1-—e-/ ify>0. 
4. F(y) =0ify<0,=1—e” ify>0. 
9.@(1,0) 12.No 
13. (a) exp(—x—), x > 0; EX* =T'(1—k/a), k <a. 
(b) exp(—e*), -co <x < 00; M(t) =T(1—-1),t< 1. 
(c) exp{—(—x)*}, x < 0; EX* = (-1)'T'(1+k/a),k > -a. 
20. (a) Yes; No (b) Yes; No. 


Problems 7.3 


3. Yes; A, =n(n+ 1)u/2, By = o,/n(n+1)(2n+ 1)/6 
5. (a) M,(t) > Oasn— co; no. (b) M,,(t) diverges as n + 00 
(c) Yes (d) Yes (e)M,— e* /4;no. 
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Problems 7.4 
1.(a) No; (b) No. 2. No. 3. Fora < 1/2. 7. (a) Yes; (b) No. 


Problems 7.5 


4, Degenerate at 3. 5. Degenerate at 0. 
6. For p > 0, N(0, \/p), and for p < 0, S,/n—> degenerate. 


Problems 7.6 


1.(b) No; (c) Yes; (d) No. 
2.N(0,1). 3. N(0,07/67). 4.163. 8. 0.0926; 1.92 


Problems 7.7 
1. (a) AN(u?, 4707) for uy 40, = /o2 +? (1) for p = 0, 02 = 07 /n. 


(b) For pp #0, 1/K ~ AN(1/p,02/p:4); for ju = 0, o,/X, > 1/N(0, 1). 
(c) For p £0, én|X| ~ AN(én|p|,02/7); for uw = 0, én(|X|/on) — €n|N(0, 1)]. 
(d) AN(e“, e742). 

2.c=1/2 and VX ~ AN(VX, 1/4). 


Problems 8.3 
2.No. 7. fo,(x)/fo,(x). 9.No. 10. No. 
11. (b) Xin; (&) (XS?)  (IDeTa ~X; ) (h) X((1)X(2)3«+«sX(n)): 
Problems 8.4 
_ r >. (n—1\P/2 P(e 
2. (951)? EL or: (Gy EE s 
2.87 aS: var (St) = (mh) 220t <sar(S?) = aa 4.No; 5. No. 


6("*)/("), O0<s<t<n, f= 3 w=(*)/(")itost<s, 
=5 
=2/(" )ite=s.ana ("7") /(")itstisren 

t t—s t 


z — yi a t=, 11. (a) NX/n; (b) No. 


n— 


12.t=Shx,1-(1-") if >t, and Life < to. 
13. (a) With t= iy, jo GD) GE tes (©) (U-1/n)s 


(d) (1—1/n) [1+ = 4). 
14. With t = x(,), [f°(t) — (t— I)"v(t— 1)]/[" — (#-1)"], > 1. 


15. With t= 71x, ( : ) Qiao 
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Problems 8.5 


1. (a), (c), (d) Yes; (b) No. 2. 0.64761 /n?. 
3. n-! sup{x2/[e* —1]}. 5. 20(1—8)/n 
x40 


Problems 8.6 

2.9 =(n=1)S/0R),=X/8 3. f= ae a , 
4. & = X(K —X2)[X2 “2 »X? = 1X? /n B = (1-X)(K- 7) —- XY. 
5. ju = ln{X /[X2]!/2}, 6? = en{X2/X }, Ye eG 


Problems 8.7 


1. (a) med(Xj); (b) Xs ()n/ LY XP; A) —n/ D7) Aa —X)). 
2.(a)X/n;  (b) by = 1/2 if X < 1/2, =X if 1/2 <X <3/4, = 3/4 if X > 3/4; 


; 6, ifX>0 ; = 
cd=< .’ ~~~ where 0) = —* + 4/X2+(*)?2, 
(c) i ifX <0 2 () 
6, = —* —4/xX2 + (*)2, X2 = XP /n: 


(d) 6 = if ny,n3 > 0; = any value in (0,1) if n, =n3 = 0; 
no mle if 1) = 0,n3 #0; no mle ifn, 4 0,n3 = 0; 
()8=—-3+3 1+4x2; ()6=X. 
3. ~=—O-'(m m/n). 
4.(a)@=Xiay,B=i(Xi—4)/n; (BD) A= Pog(X > I =e a<, 
=l,a>1.A=1if&>1,=exp{(@—1)/A} ifa<1. 
5.6=1/X. 6. fi= DlnX;/n, 6? = (nx; — fi)? /n. 
ee - eG 


> 


Problems 8.8 

2. (a) (Ex +1)/(n+1); (bo) (2H) 3K 5. X/n. 

6. (X +1)(X +n)/[(n+2)(n+3)]. 8. (a +n) max(a,X(,))/(atn—1). 
Problems 8.9 


5. (c) (n+2) [(X(ny/2) 7D = (X(1y) 7 @*)] /{(n+ 1)[(X(n /2)- 9) = (X¢1y)-@)]} 
10. (=X;)‘P(n +k) /T'(n+ 2k) 


Problems 9.2 


1.0.019, 0.857. 2.k = po + 02a/ Vn; 1=O(zq— HSH Yn), 
5. exp(—2); exp(—2/0), 60 > 1. 
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Problems 9.3 

1. d(x) = 1 if x < Oo(1— V1 —a) = 0 otherwise. 

4. b(x) = 1 if ||x| 1] >k. 5. $(x) = Lif.xy) > ¢ = 09 — fn(al/”). 

11. If 0) < 0, o(x) = 1 if xq) > Oa7!/", and if 0; < 4%, then 4(x) = 1 if x1) 
< #(=—a')-", 

12. O(x) = 1lifx < Va/2 or > 1- Va/2. 


Problems 9.4 


1. (a), (b), (c), (d) have MLR in ©X;; (e) and (f) in ll; Xj 
4. Yes. 5. Yes; yes. 


Problems 9.5 


1. b(x1,x2) = 1 if |x) —2x2| > c, = 0 otherwise, c = V 2204/2: 
2. (x) = 1 if Dx; > k. Choose k from a = P), (571 X; > k). 


Problems 9.6 

3. d(x) = 1 if (no. of x;’s > O— no. of x;’s <0) >k. 

Problems 10.2 

2. Y =# of x1,x2 insample, Y <c,; or Y > cp. 3.X <c, or > cp. 
4.8? >c, or <cp. 5. (a) X(n) >No; (b) X(n) > No or <c. 

6. |X —09/2| >. T.(a)X <c)or>c2; (b)X>c. 

11. X(1) > 0 — Ln(a)!/”. 12. X(¥) > Oa”. 

Problems 10.3 

1. Reject at a = 0.05. 3. Do not reject Ho : p1 = p2 = p3 = ps at 0.05 level. 
4. Reject Ho at a = 0.05. 5. Reject at 0.10 but not at 0.05 level. 
7. Do not reject Hp at a = 0.05. 8. Do not reject Hy at a = 0.05. 
10. U = 15.41. 12. P-value = 0.5447. 

Problems 10.4 

1. t= —4.3, reject Hp at a = 0.02. 2. t = 1.64, do not reject Ho. 


5.t=5.05. 6. Reject Hp ata =0.05. 7. Reject Hp. 8. Reject Ho. 


Problems 10.5 

1. Do not reject Hp : 0) = 02 ata = 0.10. 

3. Do not reject Ho at a = 0.05. 4. Do not reject Hp. 
Problems 10.6 


2. (a) o(x) = 1 if Sx; =5, = 0.12 if Nx; = 4, = 0 otherwise; 
(b) Minimax rule rejects Ho if x; = 4 or 5, and with probability 1/16 if Ux; = 3; 
(c) Bayes rule rejects Ho if Sx; > 2. 
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3. Reject Hy if ¥ < (1—1/n)én2 
BL) =P(Y < (n— 1)én2), 6(2) = P(Z < (n— 1)£n2), where Y ~ G(n, 1) and 
Z~ G(n,1/2) 


Problems 11.3 
1. (77.7, 84.7). 242, _ 7. (2X, QE Gina i): 


9, (2X/(2—A1), 2X/(2—Az)), 4B — A? = 4(1 —a). 10. [a!/"N]. 
Hen > tifa) | 
= [én(1+d/X ny) ° = 
12. Choose k from a = (k+ 1)e7 13. Xt %qo//n 
14. (0X? /c2, UX?/c1), eae x2 (y y)dy =1~—cand f° yx2(y)dy =n(1—a). 
15. Posterior B(n + a, Dx; +B—n). 
16. h(ulx) = /Zexp{—4(u— 3) }H[@(va(1 —¥)) — ©(—/n( 1 +3))], where & 
is standard normal DF. 


Problems 11.4 


1. (X(1) — X3,0/ (2), X()). 
2. (2nX/b, 2nX/a), choose a .b from J x3, (u)du = 1— a, and a?x3,,(a) = b’x3,,(b), 
where y2(x) is the PDF of x7(v) RV. 
3. (X/(1—b),X/(1—a)), choose a,b from 1 — a = b? — a’ and a(1 —a)* = b(1—b)?. 
4.n= [4zt_ a /2/4"] +1;n> (i/o)inti/a), 


Problems 11.5 

1. (X(ny,@7/"X (ny). 

2. (2X; /A2, 2U:X;/A1), where 1, A2 are solutions of Atfona(A1) = Azfona (Az) and 
P(1)= ; Oty is y?(v) PDF. 

3. (Xa) — ~e * X(1)): 5. (al/"Xq),X (1): 8. Yes. 


Problems 12.3 

yh ap [Go=a6 lyn (G—1?/E5 
4. Reject Ho : ao = Ao if VE &o— Gy t;)2/(n—2) 
8. Normal equations Bodixt + Bi Sat! + Byatt? = SYixt, k=0,1,2. 


Reject Ho : Bo = Oi {|Bol/ VAY// BC (¥;— Bo — Bix; — f2x2)} > co, where 


By = Dei¥; and By = ¥ — 61%, By = D(x; —¥)(¥i— Y)/E(xi -—¥)?. 
10. (a) By = 0.28, 8; =0.411;  (b) t= 4.41, reject Hp. 


= Cy. 


Problems 12.4 


= 10.8. 3. Reject at a = 0.05 but not at a = 0.01. 
4. BSS = 28.57, WSS = 26, reject at a = 0.05 but not at 0.01. 
5. F = 56.45. 6. F = 0.87. 


Problems 12.5 


4. SS Methods = 50, SS Ability = 64.56, ESS = 25.44; reject Ho at a = 0.05, not at 0.01. 
5. Fyariety = 24.00. 
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Problems 12.6 
‘ .- am ye (Fj. —y) 
2. Reject Ao if DUD, =O, 


4.551 (machines) = 2.786, df. = 3; SSI = 73.476, d.f. = 6; 
SS2 (machines) = 27.054, d.f. = 2; SSE = 41.333, df. = 24. 


5. Cities 3 227.27 4.22 
Auto 3 3695.94 68.66 
Interactions 9 9.28 0.06 
Error 16 = 287.08 

Problems 13.2 


1. dis estimable of degree 1; (number of x;’s in A) /n. 
2. (a) (mn)! DX;ZY;;  (b) S? +82. 
3. (a) EX)¥;/n; (b) U(X; + ¥; -X—Y)?/(n— 1). 


Problems 13.3 


3. Do not reject Ho. 7. Reject Ho. 10. Do not reject Ho at 0.05 level. 
11. T+ = 133, do not reject Ho. 
12. (Second part) T* = 9, do not reject Hp at vw = 0.05. 


Problems 13.4 

1. Do not reject Hp. 2. (a) Reject; (b) Reject. 

3. U = 29, reject Ho. 5.d = 1/4, do not reject Hp. 

7.t = 313.5, z= 3.73, reject; r= 10 or 12, do not reject at a = 0.05. 
Problems 13.5 

1. Reject Hp at a = 0.05. 4. Do not reject Hp at a = 0.05. 
9.(a)t=1.21;  (b)r=0.62;  (c) Reject Ho in each case. 
Problems 13.6 

1.(a) 5; (b) 8. 3. p"-2(n+p—np) <1. 

4.n> (z1~4./po(l — po) — z1-5 Vpi(1 —p1))?/(p1 — po)”. 
Problems 13.7 


1. ©) E{n(X — )?}/ES? = 1+ 2p(1—2p/n)~; ratio = 1 if p=0, > 1 for p > 0. 
2. Chi-square test based on (c) is not robust for departures from normality. 
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applications of, 327 
Chapman, Robbins and Kiefer inequality, 
377 
for discrete uniform, 378 
for normal, 379 
for uniform, 378 
Characteristic function, 87 
of multiple RVs, 136 
properties, 136 
Chebychev-Bienayme inequality, 94 
Chebychev’s inequality, 94 
improvement of, 95 
Chi-square distribution, central, 206, 261 
MGEF, 207, 262 
moments, 207, 262 
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as square of normal, 221 
noncentral, 264 
MGF, 264 
moments, 264 
Chi-square test(s), 472 
as a goodness of fit, 476 
for homogeneity, 479 
for independence, 608 
one-tailed, 472 
robustness, 631 
for testing equality of proportions, 473 
for testing parameters of multinomial, 
475 
for testing variance, 472 
two-tailed, 472 
Combinatorics, 20 
Complete, family of distributions, 347 
Complete families, binomial, 348 
chi-square, 348 
discrete uniform, 358 
hypergeometric, 358 
uniform, 348 
Complete sufficient statistic, 347, 576 
for Bernoulli, 348 
for exponential family, 350 
for normal, 351 
for uniform, 349 
Concordance, 611 
Conditional, DF, 108 
distribution, 107 
PDF, 109 
PMF, 108 
probability, 26 
Conditional expectation, 158 
properties of, 158 
Confidence, bounds, 500 
coefficient, 500 
estimation problem, 500 
Confidence interval, 499 
Bayesian, 511 
equivariant, 527 
expected length of, 517 
general method(s) of construction, 504 
level of, 500 
length of, 500 
percentile, 531 
for location parameter, 623 
for the parameter of, Bernoulli, 513 
discrete uniform, 516 
exponential, 509 
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normal, 502-503 
uniform, 509, 515 

for quantile of order p, 621 

shortest-length, 516 

from tests of hypotheses, 507 

UMA family, 502 

UMAU family, 524 

for normal mean, 524 

for normal variance, 526 

unbiased, 523 

using Chebychev’s inequality, 513 

using CLT, 512 

using properties of MLE’s, 513 
Conjugate prior distribution, 408 

natural, 408 
Confidence set, 501 

for mean and variance of normal, 

502 

UMA family of, 502 

UMAU family of, 524 

unbiased, 523 
Consistent estimator, 340 

asymptotically normal, 341 

in rth mean, 340 

strong and weak, 340 
Contaminated normal, 625 
Contingency table, 608 
Continuity correction, 328 
Continuity theorem, 317 
Continuous type distributions, 49 
Convergence, a.s., 294 

in distribution = weak, 286 

in law, 286 

of MGFs, 316-317 

modes of, 285 

of moments, 287 

of PDFs, 287 

of PMFs, 287-288 

in probability, 288 

in rth mean, 292 
Convolution of DFs, 135 
Correlation, 144 
Correlation coefficient, 144, 277 

properties, 145 
Countable additivity, 7 
Covariance, 144 

sample, 277 
Coverage, elementary, 619 

r-coverage, 620 

probability, 619 
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Credible sets, 511 
Critical region, 431 


Decision function, 401 
Degenerate RV, 173 
Degrees of freedom when pooling classes, 
479 
Delta method, 332 
Density function, probability, 49, 104 
Design matrix, 539 
Diachotomous trials, 174 
Discordance, 611 
Discrete distributions, 173 
Discrete uniform distribution, 175 
Dispersion matrix = variance — covariance 
matrix, 328 
Distribution, conditional, 107 
conjugate prior, 408 
of a function of an RV, 55 
induced, 59 
a posteriori, 404 
a priori, 403 
of sample mean, 257 
of sample median, 259 
of sample quantile, 167, 336 
of sample range, 162, 326 
Distribution function, 43 
continuity points of a, 43, 50 
of a continuous type RV, 49 
convolution, 135 
decomposition of a, 53 
discontinuity points of a, 43 
of a discrete type RV, 47 
of a function of an RV, 56 
of an RV, 43 
of multiple RVs, 100, 102 
Domain of attraction, 321 


Efficiency of an estimate, 382 
relative, 382 
Empirical DF = sample DF, 249 
Equal likelihood, 1 
Equivalent RVs, 119 
Estimable function, 360 
Estimable parameter, 576, 581 
degree, 577, 581 
kernel, 577, 582 
Estimator, 338 
equivariant, 340, 420 
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Estimator (cont'd) 
Hodges-Lehmann, 631 
least squares, 537 
minimum risk equivariant, 422 
Pitman, 424, 426 
point, 338 
Event, 3 
certain, 8 
elementary = simple, 3 
disjoint = mutually exclusive, 7, 33 
independent, 31 
null, 3 
Exchangeable random variables, 120, 149, 
255 
Expectation, conditional, 158 
properties, 158 
Expected value = mean = mathematical 
expectation, 68 
of a function of RV, 67, 136 
of product of RVs, 148 
of sum of RVs, 147 
Exponential distribution, 206 
characterizations, 208 
memoryless property of, 207 
MGF, 206 
moments, 206 
Exponential family, 242 
k-parameter, 242 
natural parameters of, 243 
one-parameter, 240 
Extreme value distribution, 224 


Factorial moments, 79 
Factorization criterion, 344 
Finite mixture density function, 225 
Finite population correction, 256 
Fisher Information, 375 
Fisher’s Z-statistic, 270 
Fitting of distribution, binomial, 482 
geometric, 482 
normal, 477 
Poisson, 478 
Fréchet, Cramér, and Rao inequality, 374 
Fréchet, Cramér, and Rao lower bound, 
375 
binomial, 376 
exponential, 385 
normal, 385 
one-parameter exponential family, 377 
Poisson, 375 
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F-distribution, central, 267 
moments of, 267 
noncentral, 269 
moments of, 269 
F-test(s), 489 
of general linear hypothesis, 540 
as generalized likelihood ratio test, 
540 
for testing equality of variances, 440 


Gamma distribution, 203 
bivariate, 113 
characterizations, 207 
MGEF, 205 
moments, 206 
relation with Poisson, 208 

Gamma function, 202 

General linear hypothesis, 536 
canonical form, 541 
estimation in, 536 
GLR test of, 540 

General linear model, 536 

Generalized Likelihood ratio test, 464 
asymptotic distribution, 470 
F-test as, 468 
for general linear hypothesis, 540 
for parameter of, binomial, 465 
for simple vs. simple hypothesis, 464 

bivariate normal, 471 
discrete uniform, 471 
exponential, 472 
normal, 466 

Generating functions, 83 
moment, 85 
probability, 83 

Geometric distribution, 84, 180 
characterizations, 182 
memoryless property of, 182 
MGEF, 180 
moments, 180 
order statistic, 164 
PGF, 84 

Glivenko-Cantelli theorem, 322 

Goodness-of-fit problem, 584 


Hazard(=failure rate) function, 227 
Helmert orthogonal matrix, 274 
Hodges-Lehmann estimators, 631 
Holder’s inequality, 153 
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Hypergeometric distribution, 184 
bivariate, 113 
mean and variance, 184 
Hypothesis, tests of, 429 
alternative, 430 
composite, 430 
null, 430 
parametric, 430 
simple, 430 
tests of, 430 


Identically distributed RVs, 119 
Implication rule, 11 
Inadmissible decision rule, 416 
Independence and correlation, 145 
Independence of events, 115 
complete = mutual, 118 
pairwise, 118 
Independence of RVs, 114-121 
complete = mutual, 118 
pairwise, 118 
Independent, identically distributed rv’s, 
119 
sequence of, 119 
Indicator function, 41 
Induced distribution, 59 
Infinitely often, 309 
Interections, 566 
Invariance, of hypothesis testing problem, 
455 
principle, 455 
Invariant, 
decision problem, 419 
family of distributions, 418 
function, 420, 455 
location, 421 
location-scale, 421 
loss function, 420 
maximal, 505 
scale, 420 
statistic, 420 
Invariant, class of distributions, 419 
estimators, 420 
maximal, 422, 455 
tests, 455 
Inverse Gaussian PDF, 228 


Jackknife, 533 
Joint, DF, 100-102 
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PDF, 104 
PMF, 103 
Jump, 47, 103 
Jump point, of a DF, 47, 103 


Kendall’s sample tau, 612 
distribution of, 612 
generating function, 92 
Kendall’s tau coefficient, 611 
Kendall’s tau test, 612 
Kernel, symmetric, 577, 582 
Kolmogorov’s, inequality, 312 
strong law of large numbers, 315 
Kolmogoroy-Smirnov one sample statistic, 
584 
for confidence bounds of DF, 587 
distribution, 585-587 
Kolmogorov-Smirnov test, 602 
comparison with chi-square test, 588 
one-sample, 587 
two-sample, 603 
Kolmogorov-Smirnov two sample statistic, 
601 
distribution, 603 
Kronecker lemma, 313 
Kurtosis, coefficient of, 83 


Laplace = double exponential distribution, 
91, 224 
MGF, 87 
Least square estimation, 537 
principle, 537 
restricted, 537 
L Hospital rule, 323 
Likelihood, 
equal, 1 
equation, 389 
equivalent, 353 
function, 389 
Limit, inferior, 11 
set, 11 
superior, 11 
Lindeberg central limit theorem, 325 
Lindeberg-Levy CLT, 323 
Lindeberg condition, 324 
Linear combinations of RVs, 147 
mean and variance, 147, 149 
Linear dependence, 145 
Linear model, 536 
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Linear regression model, 538, 543 
confidence intervals, 545 
estimation, 543 
testing of hypotheses, 545-546 

Locally most powerful test, 459 

Location family, 196 

Location-scale family, 196 

Logistic distribution, 223 

Logistic function, 551 

Logistic regression, 550 

Lognormal distribution, 88, 222 

Loss function, 339, 401 

Lower bound for variance, Chapman, 
Robbins and Kiefer inequality, 377 
Fréchet, Cramér and Rao inequality, 372 

Lyapunov condition, 326 

Lyapunov inequality, 96 


Maclaurin expansion of an mgf, 86 
Mann-Whitney statistic, 604 
moments, 582 
null distribution, 605 
Mann-Whitney-Wilcoxon test, 605 
Marginal, 
DF, 107 
PDF, 106 
PMF, 105 
Markov’s inequality, 94 
Maximal invariant statistic, 422, 455 
function of, 457 
Maximum likelihood estimation, principle 
of, 389 
Maximum likelihood estimator, 389 
asymptotic normality, 397-398 
consistency, 397 
as a function of sufficient statistic, 394 
invariance property, 396 
Maximum likelihood estimation method 
applied to, Bernoulli, 392 
binomial, 399 
bivariate normal, 395 
Cauchy, 399 
discrete uniform, 390 
exponential, 396 
gamma, 393 
hypergeometric, 391 
normal, 390 
Poisson, 399 
uniform, 391, 394 
Mean square error, 339, 362 
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Median, 80, 82 
Median test, 600 
Memoryless property, 
of exponential, 207 
of geometric, 182 
Method of finding distribution, 
CF or MGF, 90, 137 
DF, 56, 124 
transformations 128 
Methods of finding confidence interval 
Bayes, 511 
for large samples, 511 
pivot, 504 
test inversion, 507 
Method of moments, 386 
applied to, beta, 388 
binomial, 387 
gamma, 388 
lognormal, 388 
normal, 388 
Poisson, 386 
uniform, 387 
Minimal sufficient statistic, 354 
for beta, 358 
for gamma, 358 
for geometric, 358 
for normal, 355 
for Poisson, 358 
for uniform, 354, 358 
Minimax, estimator, 402 
principle, 402 
solution, 492 
Minimax estimation for parameter of, 
Bernoulli, 402 
binomial, 412 
hypergeometric, 414 
Minimum mean square error estimator, 339 
for variance of normal, 368 
Minimum risk equivariant estinator, 421 
for location parameter, 424 
for scale parameter, 425 
Mixing proportions, 225 
Minkowski inequality, 153 
Mixture density function, 224-225 
Moment, about origin, 70 
absolute, 70 
central, 77 
condition, 73 
Factorial, 79 
of conditional distribution, 158 
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of DF, 70 
of functions of multiple RVs, 136 
inequalities, 93 
lemma, 74—75 
non-existence of order, 75 
of sample covariance, 257 
of sample mean, 253 
of sample variance, 253-254 
Moment generating function, 85 
continuity theorem for, 317 
differentiation, 86 
existence, 87 
expansion, 86 
limiting, 316 
of linear combinations, 139 
and moments, 86 
of multiple RVs, 136 
of sample mean, 256 
series expansion, 86 
of sum of independent RVs, 139 
uniqueness, 86 
Monotone likelihood ratio, 446 
for hypergeometric, 448 
for one-parameter exponential family, 447 
UMP test for families with, 447 
for uniform, 446 
Most efficient estimator, 382 
asymptotically, 382 
as MLE, 395 
Most powerful test, 432 
for families with MLR, 446 
as a function of sufficient statistic, 440 
invariant, 456 
Neyman-Pearson, 438 
similar, 433 
unbiased, 432 
uniformly, 432 
Multidimentional RV = multiple RV, 99 
Multinomial coefficient, 23 
Multinomial distribution, 190 
MGF, 190 
moments, 191 
Multiple RV, 99 
continuous type, 104 
discrete type, 103 
functions of, 123 
Multiple regression, 543 
Multiplication rule, 27 
Multivariate hypergeometric distribution, 
192 
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Multivariate negative binomial 
distribution, 193 
Multivariate normal, 234 
dispersion matrix, 236 


Natural parameters, 243 
Negative binomial (=Pascal or waiting 
time) distribution, 178-179 
bivariate, 113 
central term, 194 
mean and variance, 179 
MGF, 179 
Negative hypergeometric distribution, 186 
mean and variance, 186 
Neyman-Pearson lemma, 438 
Neyman-Pearson lemma applied to, 
Bernoulli, 442 
normal, 444 
Noncentral, chi-square distribution, 263 
F-distribution, 269 
t-distribution, 266 
Noncentrality parameter, of chi-square, 
263 
F-distribution, 269 
t-distribution, 266 
Noninformative prior, 409 
Nonparametric = distribution-free 
estimation, 576-577 
methods, 576 
Nonparametric unbiased estimation, 576 
of population mean, 578 
of population variance, 578 
Normal approximation, to binomial, 328 
to Poisson, 330 
Normal distribution = Gaussian law, 
87, 216 
bivariate, 228 
characteristic function, 87 
characterizations, 219, 221, 238 
contaminated, 625, 628 
folded, 426 
as limit of binomial, 321, 328 
as limit of chi-square, 322 
as limit of Poisson, 330 
MGF, 217 
moments, 217-218 
multivariate, 234 
singular, 232 
as stable distribution, 321 
standard, 216 
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Normal distribution = Gaussian law (cont’d) 
tail probability, 219 
truncated, 111 

Normal equations, 537 


Odds, 8 
Order statistic, 164 
is complete and sufficient, 576 
joint PDF, 165 
joint marginal PDF, 168 
kth, 164 
marginal PDF, 167 
uses, 619 
moments, 169 
Ordered samples, 21 
Orders of magnitude, o and O notation, 318 


Parameter(s), of a distribution, 67, 
196, 576 
estimable, 576 
location, 196 
location-scale, 196 
order, 79 
scale, 196 
shape, 196 
space, 338 
Parametric statistical hypothesis, 430 
alternative, 430 
composite, 430 
null, 430 
problem of testing, 430 
simple, 430 
Parametric statistical inference, 245 
Pareto distribution, 82, 222 
Partition, 351 
coarser, 352 
finer, 352 
minimal sufficient, 353 
reduction of a, 352 
sets, 351 
sub-, 352 
sufficient, 351 
Percentile confidence interval, 531 
centered percentile confidence interval, 
532 
Permutation, 21 
Pitman estimator, 24 
location, 426 
scale, 426 
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Pitman’s asymptotic relative efficiency, 632 


Pivot, 504 
Point estimator, 338, 340 
Point estimation, problem of, 338 
Poisson DF, as incomplete gamma, 209 
Poisson distribution, 57, 83, 186 
central term, 194 
characterizations, 187 
coefficient of skewness, 82 
kurtosis, 82 
as limit of binomial, 194 
as limit of negative binomial, 194 
mean and variance, 187 
MGF, 187 
moments, 82 
PGF, 187 
truncated, 111 
Poisson regression, 553 
Polya distribution, 185 
Pooled sample variance, 485 
Population, 245 
Population distribution, 246 
Posterior probability, 29 
Principle of, 
equivariance, 420 
inclusion-exclusion, 9 
invariance, 456 
least squares, 537 
Probability, 7 
addition rule, 9 
axioms, 7 
conditional, 26 
continuity of, 13 
countable additivity of, 7 
density function, 49 
distribution, 42 
equally likely assignment, 7, 21 
on finite sample spaces, 20 
generating function, 83 
geometric, 13 
integral transformation, 200 
mass function, 47 
measure, 7 
monotone, 8 
multiplication rule, 27 
posterior and prior, 29 
principle of inclusion-exclusion, 9 
space, 8 
subadditivity, 9 
tail, 72 
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total, 28 

uniform assignment of, 7, 21 
Probability integral transformation, 200 
Probit regression, 552 
Problem, 

of location, 590 

of location and symmetry, 590 

of moments, 88 
P-value, 437, 481, 599 


Quadratic form, 228 
Quantile of order p = (100p)th percentile, 
79 


Random, 13 
Random experiment = statistical 
experiment, 3 
Random interval, 500 
coverage of, 619 
Random sample, 13, 246 
from a finite population, 13 
from a probability distribution, 13, 246 
Random sampling, 246 
Random set, family of, 500 
Random variable(s), 40 
bivariate, 103 
continuous type, 49, 104 
discrete type, 47 
degenerate, 48 
equivalent, 119 
exchangeable 120, 149, 255 
functions of a, 55 
multiple = multivariate, 99 
standardized, 78 
symmetric, 69 
symmetrized, 121 
truncated, 110 
uncorrelated, 145 
Range, 168 
Rank correlation coefficient, 614 
Rayleigh distribution, 224 
Realization of a sample, 246 
Rectangular distribution, 199 
Regression, 543 
coefficient, 277 
linear, 544 
logistic, 551 
model, 543 
multiple, 543 
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Poisson, 552 
probit, 552 
Regularity conditions of FCR inequality, 
12 
Resampling, 530 
Risk function, 339, 402 
Robust estimator(s), 631 
Robust test(s), 634 
Robustness, of chi-square test, 631 
of sample mean as an estimator, 628 
of sample standard deviation as an 
estimator, 628 
of Student’s t-test, 629 
Robust procedure, defined, 625, 631 
Rules of counting, 21-24 
Run, 607 
Run test, 607 


Sample, 245-246 
correlation coefficient, 251 
covariance, 251 
DF, 250 
mean, 247 
median, 251 
distribution of, 260 
MGF, 251 
moments, 250-251 
ordered, 21 
point, 3 
quantile of order p, 251, 342 
random, 246 
regression coefficient, 282 
space, 3 
statistic(s), 246, 249 
standard deviation, 248 
standard error, 256 
variance, 247 
Sampling with and without replacement, 
21, 247 
Sampling from bivariate normal, 276 
distribution of sample correlation 
coefficient, 277 
distribution of sample regression 
coefficient, 277 
independence of sample mean vector 
and dispersion matrix, 277 
Sampling from univariate normal, 271 
distribution of sample variance, 273 
independence of X and S’, 273 
Scale family, 196 
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Sequence of events, 11 
limit inferior, 11 
limit set, 11 
limit superior, 11 
nondecreasing, 12 
nonincreasing, 12 
Set function, 7 
Shortest-length confidence interval(s), 517 
for the mean of normal, 518-519 
for the parameter of exponential, 523 
for the parameter of uniform, 521 
for the variance of normal, 519 
o-field, 3 
choice of, 3 
generated by a class = smallest, 40 
Sign test, 590 
Similar tests, 454 
Single-sample problem(s), 584 
of fit, 584 
of location, 590 
and symmetry, 590 
Skewness, coefficient of, 82 
Slow variation, function of, 76 
Slutsky’s theorem, 298 


Spearman’s rank correlation coefficient, 614 


distribution, 615 
Stable distribution, 216, 321 
Standard deviation, 77 
Standard error, 256 
Standardized RV, 78 
Statistic of order k, 164 
marginal PDF, 167 
Stirling’s approximation, 194 
Stochastically larger, 600 
Strong law of large numbers, 308 
Borel’s, 315 
Kolmogorov’s, 315 
Student’s f-distribution, central, 265 
bivariate, 282 
moments, 267 
noncentral, 267 
moments, 267 
Student’s f- statistic, 265 
Student’s f- test, 484-485 
as generalized likelihood ratio test, 467 
for paired observations, 486 
robustness of, 630 
Substitution principle, 386 
estimator, 386 
Sufficient statistic, 343 
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factorization criterion, 344 
joint, 345 
Sufficient statistic for, Bernoulli, 345 
beta, 356 
discrete uniform, 346 
gamma, 356 
lognormal, 357 
normal, 346 
Poisson, 343 
uniform, 346 
Support, of a DF, 50, 103 
Survival function = reliability function, 
227 
Symmetric DF or RV, 50, 103 
Symmetrization, 121 
Symmetrized rv, 121 
Symmetry, center of, 73 


Tail probabilities, 72 
Test(s), 
a-similar, 453 
chi-square, 470 
critical = rejection region, 431 
critical function, 431 
of hypothesis, 431 
F-, 489 
invariant, 453 
level of significance, 431 
locally most powerful, 459 
most powerful, 432 
nonrandomized, 432 
one-tailed, 484 
power function, 432 
randomized, 432 
similar, 453 
size, 432 
statistic, 433 
Student’s ¢, 506 
two tailed, 484 
unbiased, 484 
uniformly most powerful, 432 
Testing the hypothesis of, equality of several 
normal means, 539 
goodness-of- fit, 482, 584 
homogeneity, 479 
independence, 608 
Tests of hypothesis, Bayes, 507 
GLR, 463 
minimax, 491 
Neyman-Pearson, 438 
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Tests of location, 590 
sign test, 590 
Wilcoxon signed-rank, 592 
Tolerance coefficient and interval, 619 
Total probability rule, 28 
Transformation, 55 
of continuous type, 58, 124, 128 
of discrete type, 58, 135 
Helmert, 274 
Jacobian of, 128 
not one-to-one, 165 
one-to-one, 56, 129 
Triangular distribution, 52 
Trimmed mean, 632 
Trinomial distribution, 191 
Truncated distribution, 110 
Truncated RVs, 110 
Truncation, 110 
Two-point distribution, 174 
Two-sample problems, 599 
Types of error in testing hypotheses, 431 


Unbiased confidence interval(s), 523 
general method of construction, 524 
for mean of normal, 524 
for parameter of exponential, 529 
for parameter of uniform, 529 
for variance of normal, 526 

Unbiased estimator, 339 
best linear, 361 
and complete sufficient statistic, 365 
LMYV, 361 
and sufficient statistic, 364 
UMYV, 361 

Unbiased estimation for parameter of, 
Bernoulli, 365, 364 
bivariate normal, 368 
discrete uniform, 369 
exponential, 369 
hypergeometric, 369 
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negative binomial, 368 
normal, 365 
Poisson, 363 
Unbiased test, 453 
for mean of normal, 454 
and similar test, 453 
UMP, 453 
Uncorrelated RVs, 145 
Uniform distribution, 56, 197 
characterization, 201 
discrete, 72, 175 
generating samples, 201 
MGEF, 199 
moments, 199 
statistic of order k, 168, 213 
truncated, 111 
UMP test(s) 
a-similar, 453 
invariant, 457 
unbiased, 453 
U-statistic, 576 
for estimating mean and variance, 578 
one-sample, 576 
two-sample, 581 


Variance, 77 
properties of, 77 
of sum of RVs, 148 
Variance stablizing transformations, 333 


Weak law of large numbers, 303, 306 
centering and norming constants, 303 
Weibull distribution, 223 
Welch approximate t-test, 486 
Wilcoxon signed-rank test, 592 
Wilcoxon statistic, 593 
distribution, 594, 597 
generating function, 93 
moments, 597 
Winsorization, 112 
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